Methods and systems for barcoding nucleic acid molecules from individual cells or cell populations

ABSTRACT

The present disclosure provides methods, compositions and systems for analyzing individual cells or cell populations through a partitioned analysis of contents of individual cells or cell populations, such as cancer cells and cells of the immune system. Individual cells or cell populations may be co-partitioned with processing reagents for accessing cellular contents, and for uniquely identifying the content of a given cell or cell population, and subsequently analyzing the content of the cell and characterizing it as having derived from an individual cell or cell population, including analysis and characterization of nucleic acid(s) from the cell through sequencing.

CROSS-REFERENCE

This application is a Continuation Application of International PatentApplication PCT/US2017/057269, filed Oct. 18, 2017, which claims thebenefit of U.S. Provisional Patent Application No. 62/410,326, filedOct. 19, 2016, and U.S. Provisional Patent Application No. 62/490,546,filed Apr. 26, 2017, each of which is incorporated herein by referencein its entirety for all purposes.

SEQUENCE LISTING

The instant application contains a Sequence Listing which has beensubmitted electronically in ASCII format and is hereby incorporated byreference in its entirety. Said ASCII copy, created on Nov. 9, 2017, isnamed 43487_745_301 SL.txt and is 44,113 bytes in size.

BACKGROUND

Significant advances in analyzing and characterizing biological andbiochemical materials and systems have led to unprecedented advances inunderstanding the mechanisms of life, health, disease and treatment.Among these advances, technologies that target and characterize thegenomic make up of biological systems have yielded some of the mostgroundbreaking results, including advances in the use and exploitationof genetic amplification technologies, and nucleic acid sequencingtechnologies.

Nucleic acid sequencing can be used to obtain information in a widevariety of biomedical contexts, including diagnostics, prognostics,biotechnology, and forensic biology. Sequencing may involve methodsincluding Maxam-Gilbert sequencing and chain-termination methods, or denovo sequencing methods including shotgun sequencing and bridge PCR, ornext-generation methods including polony sequencing, 454 pyrosequencing,Illumina sequencing, SOLiD sequencing, Ion Torrent semiconductorsequencing, HeliScope single molecule sequencing, SMRT® sequencing, andothers. Nucleic acid sequencing technologies, including next-generationDNA sequencing, have been useful for genomic and proteomic analysis ofcell populations.

SUMMARY

Recognized herein is the need for methods, compositions and systems foranalyzing genomic and proteomic information from individual cells or asmall population of cells. Such cells include, but are not limited to,cancer cells, fetal cells, and immune cells involved in immuneresponses. Provided herein are methods, compositions and systems foranalyzing individual cells or a small population of cells, including theanalysis and attribution of nucleic acids from and to these individualcells or cell populations.

In an aspect, the present disclosure provides a method for nucleic acidsequencing, comprising (a) providing a plurality of droplets, wherein adroplet of the plurality of droplets comprises (i) a ribonucleic acid(RNA) molecule comprising a nucleic acid sequence, and (ii) a beadcomprising a nucleic acid barcode molecule coupled thereto, wherein thenucleic acid barcode molecule comprises a barcode sequence; (b) usingthe RNA molecule and the nucleic acid barcode molecule to generate abarcoded nucleic acid molecule comprising, from a 5′ end to a 3′ end, asequence corresponding to the nucleic acid sequence of the RNA moleculeand a complement of the barcode sequence; and (c) sequencing thebarcoded nucleic acid molecule or a derivative thereof.

In some embodiments, the RNA molecule is from a cell. In someembodiments, the droplet comprises the cell. In some embodiments, themethod further comprises releasing the RNA molecule from the cell priorto (b).

In some embodiments, the bead comprises a plurality of nucleic acidmolecules coupled thereto, wherein the plurality of nucleic acidmolecules comprises the nucleic acid barcode molecule.

In some embodiments, each of the plurality of nucleic acid moleculescomprises the barcode sequence. In some embodiments, each of theplurality of nucleic acid molecules comprises an additional barcodesequence that varies across the plurality of nucleic acid molecules.

In some embodiments, the nucleic acid barcode molecule comprises atemplate switching sequence.

In some embodiments, the method further comprises, prior to (c),subjecting the barcoded nucleic acid molecule or derivative thereof tonucleic acid amplification. In some embodiments, the nucleic acidamplification is performed subsequent to releasing the barcoded nucleicacid molecule or derivative thereof from the droplet. In someembodiments, the nucleic acid amplification is polymerase chainreaction. In some embodiments, the RNA molecule is a messengerribonucleic acid (mRNA) molecule.

In some embodiments, in (a) the droplet comprises (i) an additionalnucleic acid molecule comprising an additional nucleic acid sequence,and (ii) an additional nucleic acid barcode molecule comprising anadditional barcode sequence, and wherein in (b) the additional nucleicacid molecule and the additional nucleic acid barcode molecule are usedto generate an additional barcoded nucleic acid molecule comprising,from a 5′ end to a 3′ end, the additional barcode sequence and anadditional sequence corresponding to the additional nucleic acidsequence. In some embodiments, the additional nucleic acid barcodemolecule is coupled to the bead. In some embodiments, the additionalnucleic acid barcode molecule is coupled to an additional bead.

In some embodiments, (b) is performed in the droplet.

In some embodiments, the method further comprises releasing the barcodednucleic acid molecule or a derivative thereof from the droplet.

In some embodiments, the barcoded nucleic acid molecule furthercomprises, towards a 3′ end, a functional sequence for permitting thebarcoded nucleic acid molecule or a derivative thereof to couple to aflow cell of a sequencer.

In some embodiments, the sequence is a reverse complement of the nucleicacid sequence.

In some embodiments, the method further comprises, prior to (c), usingthe barcoded nucleic acid molecule or a derivative thereof and a pair ofprimers to generate a subset of nucleic acids having a target nucleicacid sequence. In some embodiments, the target nucleic acid sequencecomprises a T cell receptor variable region sequence, a B cell receptorvariable region sequence, or an immunoglobulin variable region sequence.In some embodiments, the at least one of the pair of primers hybridizesto a constant region of a T cell receptor nucleic acid sequence, aconstant region of a B cell receptor nucleic acid sequence, or aconstant region of an immunoglobulin nucleic acid sequence. In someembodiments, the subset of nucleic acids or derivatives thereof aresequenced in (c).

In some embodiments, the method further comprises releasing the nucleicacid barcode molecule from the bead. In some embodiments, the nucleicacid barcode molecule is released from the bead before the barcodednucleic acid molecule is generated. In some embodiments, the nucleicacid barcode molecule is released from the bead while the barcodednucleic acid molecule is generated. In some embodiments, the nucleicacid barcode molecule is released from the bead after the barcodednucleic acid molecule is generated. In some embodiments, the bead is agel bead.

In some embodiments, the barcode sequence is a combinatorial assembly ofa plurality of barcode segments. In some embodiments, the plurality ofbarcode segments comprises at least three segments.

In an aspect, the present disclosure provides a method for generating alabeled polynucleotide. The method comprises (a) subjecting a reactionmixture to a first amplification reaction under conditions sufficient togenerate a first amplification product, wherein the reaction mixturecomprises a template polynucleotide and (i) a primer having a sequencetowards a 3′ end that hybridizes to the template polynucleotide, and(ii) a template switching oligonucleotide that comprises a firstpredefined sequence towards a 5′ end; and (b) subjecting the firstamplification product to a second amplification reaction in the presenceof a barcoded oligonucleotide under conditions sufficient to generate asecond amplification product, wherein the barcoded oligonucleotidecomprises a sequence of at least a segment of the template switchingoligonucleotide and at least a second predefined sequence, wherein (i)the second amplification reaction uses the first amplification productas a template and the barcoded oligonucleotide as a primer, or (ii) thesecond amplification reaction uses the barcoded oligonucleotide as atemplate and at least a portion of the first amplification product as aprimer, to generate the second amplification product, wherein the firstamplification reaction and the second amplification reaction areperformed within a same reaction volume. In some embodiments, the secondamplification reaction uses the first amplification product as atemplate and the barcoded oligonucleotide as a primer. In someembodiments, the second amplification reaction uses the barcodedoligonucleotide as a template and at least a portion of the firstamplification product as a primer.

In an aspect, the present disclosure provides a method for generating alabeled polynucleotide comprising (a) providing a reaction mixture in areaction volume, wherein the reaction mixture comprises (i) a templatepolynucleotide, (ii) a primer comprising a sequence towards a 3′ end ofthe primer that hybridizes to the template polynucleotide, and (iii) atemplate switching oligonucleotide; (b) in the reaction volume,subjecting the reaction mixture to a first reaction under conditionssufficient to generate a first nucleic acid product comprising theprimer, a reverse complement of a sequence of the templatepolynucleotide, and a sequence complementary to at least a portion ofthe template switch oligonucleotide; and (c) subjecting the firstnucleic acid product to a second reaction in the reaction volume, whichsecond reaction comprises using (i) the first nucleic acid product as atemplate and a barcoded oligonucleotide as a primer, which barcodedoligonucleotide comprises a sequence of at least a segment of thetemplate switching oligonucleotide, or (ii) the barcoded oligonucleotideas a template and at least a portion of the first nucleic acid as aprimer, to generate a second nucleic acid product.

In some embodiments, the template polynucleotide is obtained from asingle cell. In some embodiments, the single cell is an immune cell. Insome embodiments, the immune cell is a T-cell. In some embodiments, theimmune cell is a B-cell. In some embodiments, the method furthercomprises lysing the single cell in the same reaction volume to obtainthe template polynucleotide prior to generating the first amplificationproduct in the first amplification reaction.

In some embodiments, the template polynucleotide comprises a T-cellreceptor gene or gene product. In some embodiments, the templatepolynucleotide comprises a B-cell receptor gene or gene product. In someembodiments, the template polynucleotide is among a plurality oftemplate polynucleotides.

In some embodiments, a concentration of the template switchingoligonucleotide in the same reaction volume is at least two times thatof a concentration of the barcoded oligonucleotide in the same reactionvolume. In some embodiments, a concentration of the template switchingoligonucleotide in the same reaction volume is at least five times thatof a concentration of the barcoded oligonucleotide in the same reactionvolume. In some embodiments, a concentration of the template switchingoligonucleotide in the same reaction volume is at least ten times thatof a concentration of the barcoded oligonucleotide in the same reactionvolume. In some embodiments, a concentration of the template switchingoligonucleotide in the same reaction volume is at least twenty timesthat of a concentration of the barcoded oligonucleotide in the samereaction volume. In some embodiments, a concentration of the templateswitching oligonucleotide in the same reaction volume is at least fiftytimes that of a concentration of the barcoded oligonucleotide in thesame reaction volume. In some embodiments, a concentration of thetemplate switching oligonucleotide in the same reaction volume is atleast one hundred times that of a concentration of the barcodedoligonucleotide in the same reaction volume. In some embodiments, aconcentration of the template switching oligonucleotide in the samereaction volume is at least two hundred times that of a concentration ofthe barcoded oligonucleotide in the same reaction volume.

In some embodiments, the primer comprises a sequence towards a 5′ endthat does not specifically hybridize to the template polynucleotide.

In some embodiments, the first amplification reaction is facilitatedusing an enzyme comprising polymerase activity. In some embodiments, theenzyme is a DNA-dependent polymerase. In some embodiments, the enzyme isa reverse transcriptase.

In some embodiments, the second amplification reaction is facilitatedusing an enzyme comprising polymerase activity. In some embodiments, theenzyme is a DNA-dependent polymerase.

In some embodiments, the first amplification reaction comprisespolymerase chain reaction. In some embodiments, the first amplificationreaction comprises reverse transcription. In some embodiments, thesecond amplification reaction comprises polymerase chain reaction.

In some embodiments, the first amplification reaction and the secondamplification reaction are performed sequentially in the absence of anintervening purification step.

In some embodiments, the template switching oligonucleotide is notavailable for primer extension during the second amplification reaction.

In some embodiments, the method further comprises degrading the templateswitching oligonucleotide prior to the second amplification reaction. Insome embodiments, the template switching oligonucleotide comprisesribonucleic acids (RNA). In some embodiments, the template switchingoligonucleotide comprises at least 10% ribonucleic acids (RNA).

In some embodiments, the method further comprises degrading the templateswitching oligonucleotide during the second amplification reaction. Insome embodiments, the template switching oligonucleotide comprisesribonucleic acids (RNA). In some embodiments, the template switchingoligonucleotide comprises at least 10% ribonucleic acids (RNA).

In some embodiments, a first reaction rate of the second amplificationreaction using the barcoded oligonucleotide is greater than a secondreaction rate of the second amplification using the template switchingoligonucleotide.

In some embodiments, the first amplification product and the barcodedoligonucleotide has a higher melting temperature as compared to amelting temperature of the first amplification product and the templateswitching oligonucleotide. In some embodiments, a primer annealingtemperature of the second amplification reaction is at least 0.5° C.greater than a primer annealing temperature of the first amplificationreaction.

In some embodiments, the template switching oligonucleotide comprisesmodified nucleotides. In some embodiments, the template switchingoligonucleotide comprises at least 10% modified nucleotides. In someembodiments, the template switching oligonucleotide comprises modifiednucleotides selected from unlocked nucleic acids (UNAs), locked nucleicacids (LNAs), and 5-hydroxybutynl-2′-deoxyuridine.

In some embodiments, the barcoded oligonucleotide comprises modifiednucleotides. In some embodiments, the barcoded oligonucleotide comprisesat least 10% modified nucleotides. In some embodiments, the barcodedoligonucleotide comprises modified nucleotides selected from lockednucleic acids (LNAs), unlocked nucleic acids (UNAs), and5-hydroxybutynl-2′-deoxyuridine.

In some embodiments, the same reaction volume comprises an emulsion, adroplet, or a microwell.

In some embodiments, the first defined sequence comprises at least oneof an adaptor sequence, a barcode sequence, a unique molecularidentifier sequence, a primer binding site, and a sequencing primerbinding site. In some embodiments, the second defined sequence comprisesat least one of an adaptor sequence, a barcode sequence, a uniquemolecular identifier sequence, a primer binding site, and a sequencingprimer binding site.

In some embodiments, the primer is among a plurality of primers. In someembodiments, the sequence towards the 3′ end of the primer comprises arandom sequence. In some embodiments, the sequence towards the 3′ end ofthe primer comprises a gene specific sequence. In some embodiments, thesequence towards the 3′ end of the primer comprises a polyA sequence.

In some embodiments, the template switching oligonucleotide is among aplurality of template switching oligonucleotides. In some embodiments,the barcoded oligonucleotide is among a plurality of barcodedoligonucleotides.

In some embodiments, the method further comprises subjecting the secondamplification product to sequencing.

In some embodiments, the barcoded oligonucleotide is releasably coupledto a microcapsule. In some embodiments, the method further comprisesreleasing the barcoded oligonucleotide from the microcapsule. In someembodiments, the barcoded oligonucleotide is released from themicrocapsule upon application of a stimulus. In some embodiments, thestimulus is at least one of a biological stimulus, a chemical stimulus,a thermal stimulus, an electrical stimulus, a magnetic stimulus, a photostimulus, or any combination thereof. In some embodiments, themicrocapsule is a degradable microcapsule and releasing the barcodedoligonucleotide comprises degrading the microcapsule. In someembodiments, the microcapsule comprises a polymer gel. In someembodiments, the polymer gel is a polyacrylamide. In some embodiments,the microcapsule comprises a bead. In some embodiments, the bead is agel bead. In some embodiments, the microcapsule comprises a chemicalcross-linker. In some embodiments, the chemical cross-linker is adisulfide bond.

In an aspect, the present disclosure provides a method comprising (a)providing a reaction volume comprising (i) a cell or cell derivative,and (ii) a bead comprising a barcoded oligonucleotide releasably coupledthereto, wherein the barcoded oligonucleotide is a template switchingoligonucleotide; and (b) releasing the barcoded oligonucleotide from thebead to provide the barcoded oligonucleotide in the reaction volume at aconcentration of at least about 0.20 μM; and (c) subjecting the reactionvolume to an amplification reaction to generate an amplificationproduct, wherein during the amplification reaction, the reaction volumecomprises a template polynucleotide from the cell or cell derivative,the barcoded oligonucleotide and a primer having a sequence towards a 3′end that hybridizes to the template polynucleotide, and wherein theamplification product has sequence complementarity with the templatepolynucleotide and the barcoded oligonucleotide.

In an aspect, the present disclosure provides a method comprising (a)providing a reaction volume comprising a cell and a microcapsulecomprising a barcoded oligonucleotide releasably coupled thereto,wherein the barcoded oligonucleotide is a template switchingoligonucleotide; and (b) subjecting the reaction volume to dissociationconditions sufficient to release the barcoded oligonucleotide from themicrocapsule, thereby providing the barcoded oligonucleotide in thereaction volume at a concentration of at least about 0.20 uM; and (c)subjecting the reaction volume to an amplification reaction to generatean amplification product, wherein during the amplification reaction, thereaction volume comprises a template polynucleotide from the cell, thebarcoded oligonucleotide and a primer having a sequence towards a 3′ endthat hybridizes to the template polynucleotide, and wherein theamplification product has sequence complementarity with the templatepolynucleotide and the barcoded oligonucleotide.

In some embodiments, the method further comprises subjecting theamplification product to sequencing. In some embodiments, the barcodedoligonucleotide does not hybridize to the template polynucleotide. Insome embodiments, the template polynucleotide is an mRNA molecule. Insome embodiments, the method further comprises subjecting the reactionvolume to a second amplification reaction to generate an additionalamplification product using the amplification product as a template.

In some embodiments, the method further comprises subjecting theadditional amplification product to sequencing.

In some embodiments, the cell is a mammalian cell. In some embodiments,the cell is an immune cell. In some embodiments, the immune cell is aB-cell. In some embodiments, the immune cell is a T-cell. In someembodiments, the cell is cancer cell. In some embodiments, the cancercell is obtained from a tissue sample. In some embodiments, the cancercell is obtained from a biological fluid. In some embodiments, thebiological fluid comprises blood. In some embodiments, the biologicalfluid comprises lymph fluid.

In some embodiments, the template polynucleotide comprises a T cellreceptor gene sequence, a B cell receptor gene sequence, or animmunoglobulin gene sequence. In some embodiments, the templatepolynucleotide is a T cell receptor mRNA molecule, a B cell receptormRNA molecule, or an immunoglobulin mRNA molecule.

In some embodiments, the reaction volume further comprises an enzyme. Insome embodiments, the enzyme is a DNA polymerase. In some embodiments,the enzyme is a reverse transcriptase.

In some embodiments, the reaction volume further comprises at least onereagent for nucleic acid amplification. In some embodiments, the atleast one reagent comprises dNTPs. In some embodiments, the at least onereagent comprises oligonucleotide primers.

In some embodiments, the microcapsule comprises a polymer gel. In someembodiments, the polymer gel is a polyacrylamide. In some embodiments,the microcapsule comprises a bead. In some embodiments, the bead is agel bead. In some embodiments, the microcapsule comprises a chemicalcross-linker. In some embodiments, the chemical cross-linker is adisulfide bond. In some embodiments, the dissociation condition is atleast one of a biological stimulus, a chemical stimulus, a thermalstimulus, an electrical stimulus, a magnetic stimulus, a photo stimulus,or any combination thereof.

In some embodiments, the barcoded oligonucleotide comprises at least oneof an adaptor sequence, a barcode sequence, unique molecular identifiersequence, a primer binding site, and a sequencing primer binding site.

In some embodiments, the same reaction volume comprises an emulsion, adroplet, or a microwell.

In some embodiments, the method further comprises performing a thirdreaction, wherein the third reaction specifically amplifies variableregion cDNAs, wherein the variable region cDNA are derived from a T cellreceptor cDNA, a B cell receptor cDNA, or an immunoglobulin cDNA. Insome embodiments, the third reaction comprises use of a primer thatspecifically binds in the constant region of the T cell receptor cDNA, Bcell receptor cDNA, or immunoglobulin cDNA, and extends through thevariable region of the T cell receptor cDNA, B cell receptor cDNA, orimmunoglobulin cDNA. In some embodiments, the third reaction results inan enrichment product that comprises at (a) least one of a T cellreceptor variable region sequence, a B cell receptor variable regionsequence, and an immunoglobulin variable region sequence, and (b) atleast one of an adaptor sequence, a barcode sequence, a unique molecularidentifier sequence, a primer binding site, and a sequencing primerbinding site. In some embodiments, greater than about 25% of reads in asubsequent short-read sequencing reaction map to a T cell receptor, a Bcell receptor, or an immunoglobulin gene.

In an aspect, the present disclosure provides a non-transitorycomputer-readable medium comprising machine-executable code that, uponexecution by one of more computer processors, implements a method fornucleic acid sequencing, comprising (a) providing a plurality ofdroplets, wherein a droplet of the plurality of droplets comprises (i) aribonucleic acid (RNA) molecule comprising a nucleic acid sequence, and(ii) a bead comprising a nucleic acid barcode molecule coupled thereto,wherein the nucleic acid barcode molecule comprises a barcode sequence;(b) using the RNA molecule and the nucleic acid barcode molecule togenerate a barcoded nucleic acid molecule comprising, from a 5′ end to a3′ end, a sequence corresponding to the nucleic acid sequence of the RNAmolecule and a complement of the barcode sequence; and (c) sequencingthe barcoded nucleic acid molecule or a derivative thereof.

In an aspect, the present disclosure provides a non-transitorycomputer-readable medium comprising machine-executable code that, uponexecution by one of more computer processors, implements a method forgenerating a labeled polynucleotide, the method comprising (a)subjecting a reaction mixture to a first reaction under conditionssufficient to generate a first nucleic acid product, wherein thereaction mixture comprises (i) a template polynucleotide, (ii) a primerhaving a sequence towards a 3′ end that hybridizes to the templatepolynucleotide, and (iii) a template switching oligonucleotide, whereinthe first nucleic acid product comprises the primer, a reversecomplement of a sequence of the template polynucleotide, and a sequencecomplementary to at least a portion of the template switcholigonucleotide; and (b) subjecting the first nucleic acid product to asecond reaction in the presence of a barcoded oligonucleotide underconditions sufficient to generate a second nucleic acid product, whereinthe barcoded oligonucleotide comprises a sequence of at least a segmentof the template switching oligonucleotide, wherein (i) the secondreaction uses the first nucleic acid as a template and the barcodedoligonucleotide as a primer, or (ii) the second reaction uses thebarcoded oligonucleotide as a template and at least a portion of thefirst nucleic acid as a primer, to generate the second nucleic acidproduct, wherein the first reaction and the second reaction areperformed within a same reaction volume.

In an aspect, the present disclosure provides a non-transitorycomputer-readable medium comprising machine-executable code that, uponexecution by one of more computer processors, implements a method forgenerating a labeled polynucleotide. The method comprises (a) subjectinga reaction mixture to a first amplification reaction under conditionssufficient to generate a first amplification product, wherein thereaction mixture comprises a template polynucleotide and (i) a primerhaving a sequence towards a 3′ end that hybridizes to the templatepolynucleotide, and (ii) a template switching oligonucleotide thatcomprises a first predefined sequence towards a 5′ end; and (b)subjecting the first amplification product to a second amplificationreaction in the presence of a barcoded oligonucleotide under conditionssufficient to generate a second amplification product, wherein thebarcoded oligonucleotide comprises a sequence of at least a segment ofthe template switching oligonucleotide and at least a second predefinedsequence, wherein (i) the second amplification reaction uses the firstamplification product as a template and the barcoded oligonucleotide asa primer, or (ii) the second amplification reaction uses the barcodedoligonucleotide as a template and at least a portion of the firstamplification product as a primer, to generate the second amplificationproduct, wherein the first amplification reaction and the secondamplification reaction are performed within a same reaction volume.

Additional aspects and advantages of the present disclosure will becomereadily apparent to those skilled in this art from the followingdetailed description, wherein only illustrative embodiments of thepresent disclosure are shown and described. As will be realized, thepresent disclosure is capable of other and different embodiments, andits several details are capable of modifications in various obviousrespects, all without departing from the disclosure. Accordingly, thedrawings and description are to be regarded as illustrative in nature,and not as restrictive.

INCORPORATION BY REFERENCE

All publications, patents, and patent applications mentioned in thisspecification are herein incorporated by reference to the same extent asif each individual publication, patent, or patent application wasspecifically and individually indicated to be incorporated by reference.

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features of the invention are set forth with particularity inthe appended claims. A better understanding of the features andadvantages of the present invention will be obtained by reference to thefollowing detailed description that sets forth illustrative embodiments,in which the principles of the invention are utilized, and theaccompanying drawings (also “Figure” and “FIG.” herein), of which:

FIG. 1 schematically illustrates a microfluidic channel structure forpartitioning individual or small groups of cells.

FIG. 2 schematically illustrates a microfluidic channel structure forco-partitioning cells and microcapsules (e.g., beads) comprisingadditional reagents.

FIG. 3 schematically illustrates an example process for amplificationand barcoding of cell's nucleic acids.

FIG. 4 provides a schematic illustration of use of barcoding of cell'snucleic acids in attributing sequence data to individual cells or groupsof cells for use in their characterization.

FIG. 5 provides a schematic illustration of cells associated withlabeled cell-binding ligands.

FIG. 6 provides a schematic illustration of an example workflow forperforming RNA analysis using the methods described herein.

FIG. 7 provides a schematic illustration of an example barcodedoligonucleotide structure for use in analysis of ribonucleic (RNA) usingthe methods described herein.

FIG. 8 provides an image of individual cells co-partitioned along withindividual barcode bearing beads.

FIGS. 9A-9E provide schematic illustration of example barcodedoligonucleotide structures for use in analysis of RNA and exampleoperations for performing RNA analysis. FIG. 9A discloses SEQ ID NOS 167and 167, respectively, in order of appearance. FIG. 9B discloses SEQ IDNOS 167, 167 and 167, respectively, in order of appearance. FIG. 9Cdiscloses SEQ ID NOS 167 and 167, respectively, in order of appearance.FIG. 9D discloses SEQ ID NOS 167 and 167, respectively, in order ofappearance. FIG. 9E discloses SEQ ID NOS 167 and 167, respectively, inorder of appearance.

FIG. 10 provides schematic illustration of example barcodedoligonucleotide structure for use in example analysis of RNA and use ofa sequence for in vitro transcription. FIG. 10 discloses SEQ ID NOS 167and 167, respectively, in order of appearance.

FIG. 11 provides schematic illustration of an example barcodedoligonucleotide structure for use in analysis of RNA and exampleoperations for performing RNA analysis. FIG. 11 discloses SEQ ID NOS168-169 and 168-169, respectively, in order of appearance.

FIGS. 12A-12B provide schematic illustrations of example barcodedoligonucleotide structure for use in analysis of RNA.

FIGS. 13A-13C provide illustrations of example yields from templateswitch reverse transcription and PCR in partitions.

FIGS. 14A-14B provide illustrations of example yields from reversetranscription and complementary deoxyribonucleic acid (cDNA)amplification in partitions with various cell numbers.

FIG. 15 provides an illustration of example yields from cDNA synthesisand real-time quantitative PCR at various input cell concentrations andalso the effect of varying primer concentration on yield at a fixed cellinput concentration.

FIG. 16 provides an illustration of example yields from in vitrotranscription.

FIG. 17 shows an example computer control system that is programmed orotherwise configured to implement methods provided herein.

FIG. 18 provides a schematic illustration of an example barcodedoligonucleotide structure.

FIGS. 19A and 19B show example operations for performing RNA analysis.FIG. 19A discloses SEQ ID NOS 167, 167 and 167, respectively, in orderof appearance. FIG. 19B discloses SEQ ID NOS 168-169, 168-169, 169, 169and 169, respectively, in order of appearance.

FIG. 20 shows a schematic for enriching VDJ sequences from immunemolecules such as TCRs, BCRs, and immunoglobulins. FIG. 20 discloses SEQID NOS 167 and 167, respectively, in order of appearance.

FIGS. 21A-21C show enrichment of target sequences (A) after cDNAamplification,

-   -   (B) after enrichment, and (C) after sequencing library        preparation.

FIG. 22 shows cDNA yields from 12,000; 6,000; or 3,000 cells usinggel-beads in an emulsion-reverse transcription reaction (GEM-RT).

FIG. 23 shows sequencing results from cDNA that has been enriched usingconstant region primers compared to unenriched cDNA.

FIG. 24 shows cDNA yields using differing concentrations of a templateswitch oligo (TSO) were tested.

FIGS. 25A and 25B show cDNA yields from TSO immobilized to gel beads(GB-TSO) using either 6,000 primary T cells (A) or 2,200 Jurkat cells(B).

FIGS. 26A and 26B show cDNA yields from an enrichment using an insolution RT reaction (A) or a GEM RT reaction (B) using nestedenrichment primers.

FIGS. 27A-27C show enrichment of TCR cDNA using p7 primers only (A),variable region primers with TCR beta chain constant region primers (B),and variable region primers with TCR alpha chain constant region primers(C).

FIGS. 28A-28D show a comparison of enriched product generated witheither 8 μM or 200 μM TSO gel beads using P7 primers with TCR alphachain constant region primers (A), variable region primers with TCR betachain constant region primers (B), variable region primers with TCRalpha chain constant region primers (C), and variable region primerswith TCR beta chain constant region primers (D).

FIGS. 29A and 29B show variations of a schematic for generating labeledpolynucleotides. FIG. 29A discloses SEQ ID NOS 170, 170, 170, 170 and170, respectively, in order of appearance.

DETAILED DESCRIPTION

While various embodiments of the invention have been shown and describedherein, it will be obvious to those skilled in the art that suchembodiments are provided by way of example only. Numerous variations,changes, and substitutions may occur to those skilled in the art withoutdeparting from the invention. It should be understood that variousalternatives to the embodiments of the invention described herein may beemployed.

Where values are described as ranges, it will be understood that suchdisclosure includes the disclosure of all possible sub-ranges withinsuch ranges, as well as specific numerical values that fall within suchranges irrespective of whether a specific numerical value or specificsub-range is expressly stated.

The term “barcode,” as used herein, generally refers to a label, oridentifier, that can be part of an analyte to convey information aboutthe analyte. A barcode can be a tag attached to an analyte (e.g.,nucleic acid molecule) or a combination of the tag in addition to anendogenous characteristic of the analyte (e.g., size of the analyte orend sequence(s)). The barcode may be unique. Barcodes can have a varietyof different formats, for example, barcodes can include: polynucleotidebarcodes; random nucleic acid and/or amino acid sequences; and syntheticnucleic acid and/or amino acid sequences. A barcode can be attached toan analyte in a reversible or irreversible manner. The barcode can beadded to, for example, a fragment of a deoxyribonucleic acid (DNA) orribonucleic acid (RNA) sample before, during, and/or after sequencing ofthe sample. Barcodes can allow for identification and/or quantificationof individual sequencing-reads in real time. In some examples, thebarcode is generated in a combinatorial manner. Barcodes that may beused with methods, devices and systems of the present disclosure,including methods for forming such barcodes, are described in, forexample, U.S. Patent Pub. No. 2014/0378350, which is entirelyincorporated herein by reference.

The term “subject,” as used herein, generally refers to an animal, suchas a mammalian species (e.g., human) or avian (e.g., bird) species, orother organism, such as a plant. The subject can be a vertebrate, amammal, a mouse, a primate, a simian or a human. Animals may include,but are not limited to, farm animals, sport animals, and pets. A subjectcan be a healthy individual, an individual that has or is suspected ofhaving a disease or a pre-disposition to the disease, or an individualthat is in need of therapy or suspected of needing therapy. A subjectcan be a patient.

The term “genome,” as used herein, generally refers to an entirety of asubject's hereditary information. A genome can be encoded either in DNAor in RNA. A genome can comprise coding regions that code for proteinsas well as non-coding regions. A genome can include the sequence of allchromosomes together in an organism. For example, the human genome has atotal of 46 chromosomes. The sequence of all of these together mayconstitute a human genome.

The terms “adaptor(s)”, “adapter(s)” and “tag(s)” may be usedsynonymously. An adaptor or tag can be coupled to a polynucleotidesequence to be “tagged” by any approach including ligation,hybridization, or other approaches.

The term “sequencing,” as used herein, generally refers to methods andtechnologies for determining the sequence of nucleotide bases in one ormore polynucleotides. The polynucleotides can be, for example,deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), includingvariants or derivatives thereof (e.g., single stranded DNA). Sequencingcan be performed by various systems currently available, such as, withlimitation, a sequencing system by Illumina, Pacific Biosciences, OxfordNanopore, or Life Technologies (Ion Torrent). Such devices may provide aplurality of raw genetic data corresponding to the genetic informationof a subject (e.g., human), as generated by the device from a sampleprovided by the subject. In some situations, systems and methodsprovided herein may be used with proteomic information.

The term “variant,” as used herein, generally refers to a geneticvariant, such as a nucleic acid molecule comprising a polymorphism. Avariant can be a structural variant or copy number variant, which can begenomic variants that are larger than single nucleotide variants orshort indels. A variant can be an alteration or polymorphism in anucleic acid sample or genome of a subject. Single nucleotidepolymorphisms (SNPs) are a form of polymorphisms. Polymorphisms caninclude single nucleotide variations (SNVs), insertions, deletions,repeats, small insertions, small deletions, small repeats, structuralvariant junctions, variable length tandem repeats, and/or flankingsequences. Copy number variants (CNVs), transversions and otherrearrangements are also forms of genetic variation. A genomicalternation may be a base change, insertion, deletion, repeat, copynumber variation, or transversion.

The term “bead,” as used herein, generally refers to a particle. Thebead may be a solid or semi-solid particle. The bead may be a gel. Thebead may be formed of a polymeric material. The bead may be magnetic ornon-magnetic.

The term “sample,” as used herein, generally refers to a biologicalsample of a subject. The sample may be a tissue sample, such as abiopsy, core biopsy, needle aspirate, or fine needle aspirate. Thesample may be a fluid sample, such as a blood sample, urine sample, orsaliva sample. The sample may be a skin sample. The sample may be acheek swap. The sample may be a plasma or serum sample. The sample maybe a cell-free or cell free sample. A cell-free sample may includeextracellular polynucleotides. Extracellular polynucleotides may beisolated from a bodily sample that may be selected from a groupconsisting of blood, plasma, serum, urine, saliva, mucosal excretions,sputum, stool and tears.

The term “primer,” as used herein generally refers to a strand of RNA orDNA that serves as a starting point for nucleic acid (e.g., DNA)synthesis. A primer may be used in a primer extension reaction, whichmay be a nucleic acid amplification reaction, such as, for example,polymerase chain reaction (PCR) or reverse transcription PCR (RT-PCR).The primer may have a sequence that is capable of coupling to a nucleicacid molecule. Such sequence may be complementary to the nucleic acidmolecule, such as a poly-T sequence or a predetermined sequence, or asequence that is otherwise capable of coupling (e.g., hybridizing) tothe nucleic acid molecule, such as a universal primer.

Nucleic acid sequencing technologies have yielded substantial results insequencing biological materials, including providing substantialsequence information on individual organisms, and relatively purebiological samples. However, these systems have traditionally not beeneffective at being able to identify and characterize cells at the singlecell level.

Many nucleic acid sequencing technologies derive the nucleic acids thatthey sequence from collections of cells obtained from tissue or othersamples, such as biological fluids (e.g., blood, plasma, etc). The cellscan be processed (e.g., all together) to extract the genetic materialthat represents an average of the population of cells, which can then beprocessed into sequencing ready DNA libraries that are configured for agiven sequencing technology. Although often discussed in terms of DNA ornucleic acids, the nucleic acids derived from the cells may include DNA,or RNA, including, e.g., mRNA, total RNA, or the like, that may beprocessed to produce complementary DNA (cDNA) for sequencing. Followingprocessing, absent a cell specific marker, attribution of geneticmaterial as being contributed by a subset of cells or an individual cellmay not be possible in such an ensemble approach.

In addition to the inability to attribute characteristics to particularsubsets of cells or individual cells, such ensemble sample preparationmethods can be, from the outset, predisposed to primarily identifyingand characterizing the majority constituents in the sample of cells, andmay not be designed to pick out the minority constituents, e.g., geneticmaterial contributed by one cell, a few cells, or a small percentage oftotal cells in the sample. Likewise, where analyzing expression levels,e.g., of mRNA, an ensemble approach can be predisposed to presentingpotentially inaccurate data from cell populations that arenon-homogeneous in terms of expression levels. In some cases, whereexpression is high in a small minority of the cells in an analyzedpopulation, and absent in the majority of the cells of the population,an ensemble method may indicate low level expression for the entirepopulation.

These inaccuracies can be further magnified through processingoperations used in generating the sequencing libraries from thesesamples. In particular, many next generation sequencing technologies(e.g., massively parallel sequencing) may rely upon the geometricamplification of nucleic acid fragments, such as via polymerase chainreaction, in order to produce sufficient DNA for the sequencing library.However, such amplification can be biased toward amplification ofmajority constituents in a sample, and may not preserve the startingratios of such minority and majority components. While some of thesedifficulties may be addressed by utilizing different sequencing systems,such as single molecule systems that do not require amplification, thesingle molecule systems, as well as the ensemble sequencing methods ofother next generation sequencing systems, can also have large input DNArequirements. Some single molecule sequencing systems, for example, canhave sample input DNA requirements of from 500 nanograms (ng) to upwardsof 10 micrograms (μg), which may not be obtainable from individual cellsor even small subpopulations of cells. Likewise, other NGS systems canbe optimized for starting amounts of sample DNA in the sample of fromapproximately 50 ng to about 1 μg.

Disclosed herein are methods and systems for characterizing nucleicacids from small populations of cells, and in some cases, forcharacterizing nucleic acids from individual cells. The methodsdescribed herein may compartmentalize the analysis of individual cellsor small populations of cells, including e.g., nucleic acids fromindividual cells or small groups of cells, and then allow that analysisto be attributed back to the individual cell or small group of cellsfrom which the nucleic acids were derived. This can be accomplishedregardless of whether the cell population represents a 50/50 mix of celltypes, a 90/10 mix of cell types, or virtually any ratio of cell types,as well as a complete heterogeneous mix of different cell types, or anymixture between these. Differing cell types may include cells fromdifferent tissue types of an individual or the same tissue type fromdifferent individuals, or biological organisms such as microorganismsfrom differing genera, species, strains, variants, or any combination ofany or all of the foregoing. For example, differing cell types mayinclude normal and tumor tissue from an individual, various cell typesobtained from a human subject such as a variety of immune cells (e.g., Bcells, T cells, and the like), multiple different bacterial species,strains and/or variants from environmental, forensic, microbiome orother samples, or any of a variety of other mixtures of cell types.

Methods and systems described herein may provide for thecompartmentalization, depositing or partitioning of the nucleic acidcontents of individual cells from a sample material containing cells,into discrete compartments or partitions (referred to interchangeablyherein as partitions), where each partition maintains separation of itsown contents from the contents of other partitions. In some examples, apartition is a droplet or well. Unique identifiers, e.g., barcodes, maybe previously, subsequently or concurrently delivered to the partitionsthat hold the compartmentalized or partitioned cells or cellularderivatives, in order to allow for the later attribution of thecharacteristics of the individual cells to the particular compartment.Barcodes may be delivered, for example in an oligonucleotide to apartition via any suitable mechanism, such as using beads (e.g., gelbeads). In some examples, cellular derivatives, such as cells orconstituents of such cells in matrix (e.g., gel or polymeric matrix),are compartmentalized or partitioned in partitions (e.g., droplets orwells).

In some embodiments, barcoded oligonucleotides are delivered to apartition via a microcapsule. In some cases, barcoded oligonucleotidesare initially associated with the microcapsule and then released fromthe microcapsule upon application of a stimulus which allows theoligonucleotides to dissociate or to be released from the microcapsule.

A microcapsule, in some embodiments, comprises a bead. In someembodiments, a bead may be porous, non-porous, solid, semi-solid,semi-fluidic, or fluidic. In some embodiments, a bead may bedissolvable, disruptable, or degradable. In some cases, a bead may notbe degradable. In some embodiments, the bead may be a gel bead. A gelbead may be a hydrogel bead. A gel bead may be formed from molecularprecursors, such as a polymeric or monomeric species. A semi-solid beadmay be a liposomal bead. Solid beads may comprise metals including ironoxide, gold, and silver. In some cases, the beads are silica beads. Insome cases, the beads are rigid. In some cases, the beads may beflexible and/or compressible.

In some embodiments, the bead may contain molecular precursors (e.g.,monomers or polymers), which may form a polymer network viapolymerization of the precursors. In some cases, a precursor may be analready polymerized species capable of undergoing further polymerizationvia, for example, a chemical cross-linkage. In some cases, a precursorcomprises one or more of an acrylamide or a methacrylamide monomer,oligomer, or polymer. In some cases, the bead may comprise prepolymers,which are oligomers capable of further polymerization. For example,polyurethane beads may be prepared using prepolymers. In some cases, thebead may contain individual polymers that may be further polymerizedtogether. In some cases, beads may be generated via polymerization ofdifferent precursors, such that they comprise mixed polymers,co-polymers, and/or block co-polymers.

A bead may comprise natural and/or synthetic materials. For example, apolymer can be a natural polymer or a synthetic polymer. In some cases,a bead comprises both natural and synthetic polymers. Examples ofnatural polymers include proteins and sugars such as deoxyribonucleicacid, rubber, cellulose, starch (e.g., amylose, amylopectin), proteins,enzymes, polysaccharides, silks, polyhydroxyalkanoates, chitosan,dextran, collagen, carrageenan, ispaghula, acacia, agar, gelatin,shellac, sterculia gum, xanthan gum, Corn sugar gum, guar gum, gumkaraya, agarose, alginic acid, alginate, or natural polymers thereof.Examples of synthetic polymers include acrylics, nylons, silicones,spandex, viscose rayon, polycarboxylic acids, polyvinyl acetate,polyacrylamide, polyacrylate, polyethylene glycol, polyurethanes,polylactic acid, silica, polystyrene, polyacrylonitrile, polybutadiene,polycarbonate, polyethylene, polyethylene terephthalate,poly(chlorotrifluoroethylene), poly(ethylene oxide), poly(ethyleneterephthalate), polyethylene, polyisobutylene, poly(methylmethacrylate), poly(oxymethylene), polyformaldehyde, polypropylene,polystyrene, poly(tetrafluoroethylene), poly(vinyl acetate), poly(vinylalcohol), poly(vinyl chloride), poly(vinylidene dichloride),poly(vinylidene difluoride), poly(vinyl fluoride) and combinations(e.g., co-polymers) thereof. Beads may also be formed from materialsother than polymers, including lipids, micelles, ceramics,glass-ceramics, material composites, metals, other inorganic materials,and others.

In some cases, a chemical cross-linker may be a precursor used tocross-link monomers during polymerization of the monomers and/or may beused to attach oligonucleotides (e.g., barcoded oligonucleotides) to thebead. In some cases, polymers may be further polymerized with across-linker species or other type of monomer to generate a furtherpolymeric network. Non-limiting examples of chemical cross-linkers (alsoreferred to as a “crosslinker” or a “crosslinker agent” herein) includecystamine, gluteraldehyde, dimethyl suberimidate, N-Hydroxysuccinimidecrosslinker BS3, formaldehyde, carbodiimide (EDC), SMCC, Sulfo-SMCC,vinylsilane, N,N′diallyltartardiamide (DATD),N,N′-Bis(acryloyl)cystamine (BAC), or homologs thereof. In some cases,the crosslinker used in the present disclosure contains cystamine.

Crosslinking may be permanent or reversible, depending upon theparticular crosslinker used. Reversible crosslinking may allow for thepolymer to linearize or dissociate under appropriate conditions. In somecases, reversible cross-linking may also allow for reversible attachmentof a material bound to the surface of a bead. In some cases, across-linker may form disulfide linkages. In some cases, the chemicalcross-linker forming disulfide linkages may be cystamine or a modifiedcystamine.

In some embodiments, disulfide linkages can be formed between molecularprecursor units (e.g., monomers, oligomers, or linear polymers) orprecursors incorporated into a bead and oligonucleotides. Cystamine(including modified cystamines), for example, is an organic agentcomprising a disulfide bond that may be used as a crosslinker agentbetween individual monomeric or polymeric precursors of a bead.Polyacrylamide may be polymerized in the presence of cystamine or aspecies comprising cystamine (e.g., a modified cystamine) to generatepolyacrylamide gel beads comprising disulfide linkages (e.g., chemicallydegradable beads comprising chemically-reducible cross-linkers). Thedisulfide linkages may permit the bead to be degraded (or dissolved)upon exposure of the bead to a reducing agent.

In some embodiments, chitosan, a linear polysaccharide polymer, may becrosslinked with glutaraldehyde via hydrophilic chains to form a bead.Crosslinking of chitosan polymers may be achieved by chemical reactionsthat are initiated by heat, pressure, change in pH, and/or radiation.

In some embodiments, the bead may comprise covalent or ionic bondsbetween polymeric precursors (e.g., monomers, oligomers, linearpolymers), oligonucleotides, primers, and other entities. In some cases,the covalent bonds comprise carbon-carbon bonds or thioether bonds.

In some cases, a bead may comprise an acrydite moiety, which in certainaspects may be used to attach one or more oligonucleotides (e.g.,barcode sequence, barcoded oligonucleotide, primer, or otheroligonucleotide) to the bead. In some cases, an acrydite moiety canrefer to an acrydite analogue generated from the reaction of acryditewith one or more species, such as, the reaction of acrydite with othermonomers and cross-linkers during a polymerization reaction. Acryditemoieties may be modified to form chemical bonds with a species to beattached, such as an oligonucleotide (e.g., barcode sequence, barcodedoligonucleotide, primer, or other oligonucleotide). Acrydite moietiesmay be modified with thiol groups capable of forming a disulfide bond ormay be modified with groups already comprising a disulfide bond. Thethiol or disulfide (via disulfide exchange) may be used as an anchorpoint for a species to be attached or another part of the acryditemoiety may be used for attachment. In some cases, attachment isreversible, such that when the disulfide bond is broken (e.g., in thepresence of a reducing agent), the attached species is released from thebead. In other cases, an acrydite moiety comprises a reactive hydroxylgroup that may be used for attachment.

Functionalization of beads for attachment of oligonucleotides may beachieved through a wide range of different approaches, includingactivation of chemical groups within a polymer, incorporation of activeor activatable functional groups in the polymer structure, or attachmentat the pre-polymer or monomer stage in bead production.

For example, precursors (e.g., monomers, cross-linkers) that arepolymerized to form a bead may comprise acrydite moieties, such thatwhen a bead is generated, the bead also comprises acrydite moieties. Theacrydite moieties can be attached to an oligonucleotide, such as aprimer (e.g., a primer for amplifying target nucleic acids, barcodedoligonucleotide, etc) that is desired to be incorporated into the bead.In some cases, the primer comprises a P5 sequence for attachment to asequencing flow cell for Illumina sequencing. In some cases, the primercomprises a P7 sequence for attachment to a sequencing flow cell forIllumina sequencing. In some cases, the primer comprises a barcodesequence. In some cases, the primer further comprises a unique molecularidentifier (UMI). In some cases, the primer comprises an R1 primersequence for Illumina sequencing. In some cases, the primer comprises anR2 primer sequence for Illumina sequencing.

In some cases, precursors comprising a functional group that is reactiveor capable of being activated such that it becomes reactive can bepolymerized with other precursors to generate gel beads comprising theactivated or activatable functional group. The functional group may thenbe used to attach additional species (e.g., disulfide linkers, primers,other oligonucleotides, etc.) to the gel beads. For example, someprecursors comprising a carboxylic acid (COOH) group can co-polymerizewith other precursors to form a gel bead that also comprises a COOHfunctional group. In some cases, acrylic acid (a species comprising freeCOOH groups), acrylamide, and bis(acryloyl)cystamine can beco-polymerized together to generate a gel bead comprising free COOHgroups. The COOH groups of the gel bead can be activated (e.g., via1-Ethyl-3-(3-dimethylaminopropyl)carbodiimide (EDC) andN-Hydroxysuccinimide (NHS) or4-(4,6-Dimethoxy-1,3,5-triazin-2-yl)-4-methylmorpholinium chloride(DMTMM)) such that they are reactive (e.g., reactive to amine functionalgroups where EDC/NHS or DMTMM are used for activation). The activatedCOOH groups can then react with an appropriate species (e.g., a speciescomprising an amine functional group where the carboxylic acid groupsare activated to be reactive with an amine functional group) comprisinga moiety to be linked to the bead.

Beads comprising disulfide linkages in their polymeric network may befunctionalized with additional species via reduction of some of thedisulfide linkages to free thiols. The disulfide linkages may be reducedvia, for example, the action of a reducing agent (e.g., DTT, TCEP, etc.)to generate free thiol groups, without dissolution of the bead. Freethiols of the beads can then react with free thiols of a species or aspecies comprising another disulfide bond (e.g., via thiol-disulfideexchange) such that the species can be linked to the beads (e.g., via agenerated disulfide bond). In some cases, free thiols of the beads mayreact with any other suitable group. For example, free thiols of thebeads may react with species comprising an acrydite moiety. The freethiol groups of the beads can react with the acrydite via Michaeladdition chemistry, such that the species comprising the acrydite islinked to the bead. In some cases, uncontrolled reactions can beprevented by inclusion of a thiol capping agent such asN-ethylmalieamide or iodoacetate.

Activation of disulfide linkages within a bead can be controlled suchthat only a small number of disulfide linkages are activated. Controlmay be exerted, for example, by controlling the concentration of areducing agent used to generate free thiol groups and/or concentrationof reagents used to form disulfide bonds in bead polymerization. In somecases, a low concentration (e.g., molecules of reducing agent:gel beadratios of less than about 10,000; 100,000; 1,000,000; 10,000,000;100,000,000; 1,000,000,000; 10,000,000,000; or 100,000,000,000) ofreducing agent may be used for reduction. Controlling the number ofdisulfide linkages that are reduced to free thiols may be useful inensuring bead structural integrity during functionalization. In somecases, optically-active agents, such as fluorescent dyes may be may becoupled to beads via free thiol groups of the beads and used to quantifythe number of free thiols present in a bead and/or track a bead.

In some cases, addition of moieties to a gel bead after gel beadformation may be advantageous. For example, addition of anoligonucleotide (e.g., barcoded oligonucleotide) after gel beadformation may avoid loss of the species during chain transfertermination that can occur during polymerization. Moreover, smallerprecursors (e.g., monomers or cross linkers that do not comprise sidechain groups and linked moieties) may be used for polymerization and canbe minimally hindered from growing chain ends due to viscous effects. Insome cases, functionalization after gel bead synthesis can minimizeexposure of species (e.g., oligonucleotides) to be loaded withpotentially damaging agents (e.g., free radicals) and/or chemicalenvironments. In some cases, the generated gel may possess an uppercritical solution temperature (UCST) that can permit temperature drivenswelling and collapse of a bead. Such functionality may aid inoligonucleotide (e.g., a primer) infiltration into the bead duringsubsequent functionalization of the bead with the oligonucleotide.Post-production functionalization may also be useful in controllingloading ratios of species in beads, such that, for example, thevariability in loading ratio is minimized. Species loading may also beperformed in a batch process such that a plurality of beads can befunctionalized with the species in a single batch.

In some cases, an acrydite moiety linked to precursor, another specieslinked to a precursor, or a precursor itself comprises a labile bond,such as chemically, thermally, or photo-sensitive bonds e.g., disulfidebonds, UV sensitive bonds, or the like. Once acrydite moieties or othermoieties comprising a labile bond are incorporated into a bead, the beadmay also comprise the labile bond. The labile bond may be, for example,useful in reversibly linking (e.g., covalently linking) species (e.g.,barcodes, primers, etc.) to a bead. In some cases, a thermally labilebond may include a nucleic acid hybridization based attachment, e.g.,where an oligonucleotide is hybridized to a complementary sequence thatis attached to the bead, such that thermal melting of the hybridreleases the oligonucleotide, e.g., a barcode containing sequence, fromthe bead or microcapsule.

The addition of multiple types of labile bonds to a gel bead may resultin the generation of a bead capable of responding to varied stimuli.Each type of labile bond may be sensitive to an associated stimulus(e.g., chemical stimulus, light, temperature, etc.) such that release ofspecies attached to a bead via each labile bond may be controlled by theapplication of the appropriate stimulus. Such functionality may beuseful in controlled release of species from a gel bead. In some cases,another species comprising a labile bond may be linked to a gel beadafter gel bead formation via, for example, an activated functional groupof the gel bead as described above. As will be appreciated, barcodesthat are releasably, cleavably or reversibly attached to the beadsdescribed herein include barcodes that are released or releasablethrough cleavage of a linkage between the barcode molecule and the bead,or that are released through degradation of the underlying bead itself,allowing the barcodes to be accessed or accessible by other reagents, orboth.

The barcodes that are releasable as described herein may sometimes bereferred to as being activatable, in that they are available forreaction once released. Thus, for example, an activatable barcode may beactivated by releasing the barcode from a bead (or other suitable typeof partition described herein). Other activatable configurations arealso envisioned in the context of the described methods and systems.

In addition to thermally cleavable bonds, disulfide bonds and UVsensitive bonds, other non-limiting examples of labile bonds that may becoupled to a precursor or bead include an ester linkage (e.g., cleavablewith an acid, a base, or hydroxylamine), a vicinal diol linkage (e.g.,cleavable via sodium periodate), a Diels-Alder linkage (e.g., cleavablevia heat), a sulfone linkage (e.g., cleavable via a base), a silyl etherlinkage (e.g., cleavable via an acid), a glycosidic linkage (e.g.,cleavable via an amylase), a peptide linkage (e.g., cleavable via aprotease), or a phosphodiester linkage (e.g., cleavable via a nuclease(e.g., DNAase)).

Species that do not participate in polymerization may also beencapsulated in beads during bead generation (e.g., duringpolymerization of precursors). Such species may be entered intopolymerization reaction mixtures such that generated beads comprise thespecies upon bead formation. In some cases, such species may be added tothe gel beads after formation. Such species may include, for example,oligonucleotides, reagents for a nucleic acid amplification reaction(e.g., primers, polymerases, dNTPs, co-factors (e.g., ionic co-factors))including those described herein, reagents for enzymatic reactions(e.g., enzymes, co-factors, substrates), or reagents for a nucleic acidmodification reactions such as polymerization, ligation, or digestion.Trapping of such species may be controlled by the polymer networkdensity generated during polymerization of precursors, control of ioniccharge within the gel bead (e.g., via ionic species linked topolymerized species), or by the release of other species. Encapsulatedspecies may be released from a bead upon bead degradation and/or byapplication of a stimulus capable of releasing the species from thebead.

Beads may be of uniform size or heterogeneous size. In some cases, thediameter of a bead may be about 1 μm, 5 μm, 10 μm, 20 μm, 30 μm, 40 μm,50 μm, 60 μm, 70 μm, 80 μm, 90 μm, 100 μm, 250 μm, 500 μm, or 1 mm. Insome cases, a bead may have a diameter of at least about 1 μm, 5 μm, 10μm, 20 μm, 30 μm, 40 μm, 50 μm, 60 μm, 70 μm, 80 μm, 90 μm, 100 μm, 250μm, 500 μm, 1 mm, or more. In some cases, a bead may have a diameter ofless than about 1 μm, 5 μm, 10 μm, 20 μm, 30 μm, 40 μm, 50 μm, 60 μm, 70μm, 80 μm, 90 μm, 100 μm, 250 μm, 500 μm, or 1 mm. In some cases, a beadmay have a diameter in the range of about 40-75 μm, 30-75 μm, 20-75 μm,40-85 μm, 40-95 μm, 20-100 μm, 10-100 μm, 1-100 μm, 20-250 μm, or 20-500μm.

In certain aspects, beads are provided as a population or plurality ofbeads having a relatively monodisperse size distribution. Where it maybe desirable to provide relatively consistent amounts of reagents withinpartitions, maintaining relatively consistent bead characteristics, suchas size, can contribute to the overall consistency. In particular, thebeads described herein may have size distributions that have acoefficient of variation in their cross-sectional dimensions of lessthan 50%, less than 40%, less than 30%, less than 20%, and in some casesless than 15%, less than 10%, or even less than 5%.

Beads may be of any suitable shape. Examples of bead shapes include, butare not limited to, spherical, non-spherical, oval, oblong, amorphous,circular, cylindrical, and variations thereof.

In addition to, or as an alternative to the cleavable linkages betweenthe beads and the associated molecules, e.g., barcode containingoligonucleotides, described above, the beads may be degradable,disruptable, or dissolvable spontaneously or upon exposure to one ormore stimuli (e.g., temperature changes, pH changes, exposure toparticular chemical species or phase, exposure to light, reducing agent,etc.). In some cases, a bead may be dissolvable, such that materialcomponents of the beads are solubilized when exposed to a particularchemical species or an environmental change, such as a changetemperature or a change in pH. In some cases, a gel bead is degraded ordissolved at elevated temperature and/or in basic conditions. In somecases, a bead may be thermally degradable such that when the bead isexposed to an appropriate change in temperature (e.g., heat), the beaddegrades. Degradation or dissolution of a bead bound to a species (e.g.,an oligonucleotide, e.g., barcoded oligonucleotide) may result inrelease of the species from the bead.

A degradable bead may comprise one or more species with a labile bondsuch that, when the bead/species is exposed to the appropriate stimuli,the bond is broken and the bead degrades. The labile bond may be achemical bond (e.g., covalent bond, ionic bond) or may be another typeof physical interaction (e.g., van der Waals interactions, dipole-dipoleinteractions, etc.). In some cases, a crosslinker used to generate abead may comprise a labile bond. Upon exposure to the appropriateconditions, the labile bond can be broken and the bead degraded. Forexample, upon exposure of a polyacrylamide gel bead comprising cystaminecrosslinkers to a reducing agent, the disulfide bonds of the cystaminecan be broken and the bead degraded.

A degradable bead may be useful in more quickly releasing an attachedspecies (e.g., an oligonucleotide, a barcode sequence, a primer, etc)from the bead when the appropriate stimulus is applied to the bead ascompared to a bead that does not degrade. For example, for a speciesbound to an inner surface of a porous bead or in the case of anencapsulated species, the species may have greater mobility andaccessibility to other species in solution upon degradation of the bead.In some cases, a species may also be attached to a degradable bead via adegradable linker (e.g., disulfide linker). The degradable linker mayrespond to the same stimuli as the degradable bead or the two degradablespecies may respond to different stimuli. For example, a barcodesequence may be attached, via a disulfide bond, to a polyacrylamide beadcomprising cystamine. Upon exposure of the barcoded-bead to a reducingagent, the bead degrades and the barcode sequence is released uponbreakage of both the disulfide linkage between the barcode sequence andthe bead and the disulfide linkages of the cystamine in the bead.

A degradable bead may be introduced into a partition, such as a dropletof an emulsion or a well, such that the bead degrades within thepartition and any associated species (e.g., oligonucleotides) arereleased within the droplet when the appropriate stimulus is applied.The free species (e.g., oligonucleotides) may interact with otherreagents contained in the partition. For example, a polyacrylamide beadcomprising cystamine and linked, via a disulfide bond, to a barcodesequence, may be combined with a reducing agent within a droplet of awater-in-oil emulsion. Within the droplet, the reducing agent breaks thevarious disulfide bonds resulting in bead degradation and release of thebarcode sequence into the aqueous, inner environment of the droplet. Inanother example, heating of a droplet comprising a bead-bound barcodesequence in basic solution may also result in bead degradation andrelease of the attached barcode sequence into the aqueous, innerenvironment of the droplet.

As will be appreciated from the above disclosure, while referred to asdegradation of a bead, in many instances as noted above, thatdegradation may refer to the disassociation of a bound or entrainedspecies from a bead, both with and without structurally degrading thephysical bead itself. For example, entrained species may be releasedfrom beads through osmotic pressure differences due to, for example,changing chemical environments. By way of example, alteration of beadpore sizes due to osmotic pressure differences can generally occurwithout structural degradation of the bead itself. In some cases, anincrease in pore size due to osmotic swelling of a bead can permit therelease of entrained species within the bead. In other cases, osmoticshrinking of a bead may cause a bead to better retain an entrainedspecies due to pore size contraction.

Where degradable beads are provided, it may be desirable to avoidexposing such beads to the stimulus or stimuli that cause suchdegradation prior to the desired time, in order to avoid premature beaddegradation and issues that arise from such degradation, including forexample poor flow characteristics and aggregation. By way of example,where beads comprise reducible cross-linking groups, such as disulfidegroups, it will be desirable to avoid contacting such beads withreducing agents, e.g., DTT or other disulfide cleaving reagents. In suchcases, treatment to the beads described herein will, in some cases beprovided free of reducing agents, such as DTT. Because reducing agentsare often provided in commercial enzyme preparations, it may bedesirable to provide reducing agent free (or DTT free) enzymepreparations in treating the beads described herein. Examples of suchenzymes include, e.g., polymerase enzyme preparations, reversetranscriptase enzyme preparations, ligase enzyme preparations, as wellas many other enzyme preparations that may be used to treat the beadsdescribed herein. The terms “reducing agent free” or “DTT free”preparations can refer to a preparation having less than 1/10th, lessthan 1/50th, and even less than 1/100th of the lower ranges for suchmaterials used in degrading the beads. For example, for DTT, thereducing agent free preparation will typically have less than 0.01 mM,0.005 mM, 0.001 mM DTT, 0.0005 mM DTT, or even less than 0.0001 mM DTT.In many cases, the amount of DTT will be undetectable.

In some cases, a stimulus may be used to trigger degradation of thebead, which may result in the release of contents from the bead.Generally, a stimulus may cause degradation of the bead structure, suchas degradation of the covalent bonds or other types of physicalinteraction. These stimuli may be useful in inducing a bead to degradeand/or to release its contents. Examples of stimuli that may be usedinclude chemical stimuli, thermal stimuli, optical stimuli (e.g., light)and any combination thereof, as described more fully below.

Numerous chemical triggers may be used to trigger the degradation ofbeads. Examples of these chemical changes may include, but are notlimited to pH-mediated changes to the integrity of a component withinthe bead, degradation of a component of a bead via cleavage ofcross-linked bonds, and depolymerization of a component of a bead.

In some embodiments, a bead may be formed from materials that comprisedegradable chemical crosslinkers, such as BAC or cystamine. Degradationof such degradable crosslinkers may be accomplished through a number ofmechanisms. In some examples, a bead may be contacted with a chemicaldegrading agent that may induce oxidation, reduction or other chemicalchanges. For example, a chemical degrading agent may be a reducingagent, such as dithiothreitol (DTT). Additional examples of reducingagents may include β-mercaptoethanol, (2S)-2-amino-1,4-dimercaptobutane(dithiobutylamine or DTBA), tris(2-carboxyethyl) phosphine (TCEP), orcombinations thereof. A reducing agent may degrade the disulfide bondsformed between gel precursors forming the bead, and thus, degrade thebead. In other cases, a change in pH of a solution, such as an increasein pH, may trigger degradation of a bead. In other cases, exposure to anaqueous solution, such as water, may trigger hydrolytic degradation, andthus degradation of the bead.

Beads may also be induced to release their contents upon the applicationof a thermal stimulus. A change in temperature can cause a variety ofchanges to a bead. For example, heat can cause a solid bead to liquefy.A change in heat may cause melting of a bead such that a portion of thebead degrades. In other cases, heat may increase the internal pressureof the bead components such that the bead ruptures or explodes. Heat mayalso act upon heat-sensitive polymers used as materials to constructbeads.

The methods, compositions, devices, and kits of this disclosure may beused with any suitable agent to degrade beads. In some embodiments,changes in temperature or pH may be used to degrade thermo-sensitive orpH-sensitive bonds within beads. In some embodiments, chemical degradingagents may be used to degrade chemical bonds within beads by oxidation,reduction or other chemical changes. For example, a chemical degradingagent may be a reducing agent, such as DTT, wherein DTT may degrade thedisulfide bonds formed between a crosslinker and gel precursors, thusdegrading the bead. In some embodiments, a reducing agent may be addedto degrade the bead, which may or may not cause the bead to release itscontents. Examples of reducing agents may include dithiothreitol (DTT),β-mercaptoethanol, (2S)-2-amino-1,4-dimercaptobutane (dithiobutylamineor DTBA), tris(2-carboxyethyl) phosphine (TCEP), or combinationsthereof. The reducing agent may be present at a concentration of about0.1 mM, 0.5 mM, 1 mM, 5 mM, or 10 mM. The reducing agent may be presentat a concentration of at least about 0.1 mM, 0.5 mM, 1 mM, 5 mM, 10 mM,or greater. The reducing agent may be present at concentration of atmost about 0.1 mM, 0.5 mM, 1 mM, 5 mM, or 10 mM.

Any suitable number of nucleic acid molecules (e.g., primer, e.g.,barcoded oligonucleotide) can be associated with a bead such that, uponrelease from the bead, the nucleic acid molecules (e.g., primer, e.g.,barcoded oligonucleotide) are present in the partition at a pre-definedconcentration. Such pre-defined concentration may be selected tofacilitate certain reactions for generating a sequencing library, e.g.,amplification, within the partition. In some cases, the pre-definedconcentration of the primer is limited by the process of producingoligonucleotide bearing beads.

In some aspects, the partitions refer to containers or vessels (such aswells, microwells, tubes, vials, through ports in nanoarray substrates,e.g., BioTrove nanoarrays, or other containers). In some aspects, thecompartments or partitions comprise partitions that are flowable withinfluid streams. These partitions may comprise, e.g., micro-vesicles thathave an outer barrier surrounding an inner fluid center or core, or, insome cases, they may comprise a porous matrix that is capable ofentraining and/or retaining materials within its matrix. In someaspects, partitions comprise droplets of aqueous fluid within anon-aqueous continuous phase, e.g., an oil phase. A variety of differentvessels are described in, for example, U.S. Patent ApplicationPublication No. 20140155295, the full disclosure of which isincorporated herein by reference in its entirety for all purposes.Emulsion systems for creating stable droplets in non-aqueous or oilcontinuous phases are described in detail in, e.g., U.S. PatentApplication Publication No. 20100105112, the full disclosure of which isincorporated herein by reference in its entirety for all purposes.

In the case of droplets in an emulsion, allocating individual cells todiscrete partitions may generally be accomplished by introducing aflowing stream of cells in an aqueous fluid into a flowing stream of anon-aqueous fluid, such that droplets are generated at the junction ofthe two streams. By providing the aqueous cell-containing stream at acertain concentration of cells, the occupancy of the resultingpartitions (e.g., number of cells per partition) can be controlled.Where single cell partitions are desired, the relative flow rates of thefluids can be selected such that, on average, the partitions containless than one cell per partition, in order to ensure that thosepartitions that are occupied, are primarily singly occupied. In someembodiments, the relative flow rates of the fluids can be selected suchthat a majority of partitions are occupied, e.g., allowing for only asmall percentage of unoccupied partitions. In some aspects, the flowsand channel architectures are controlled as to ensure a desired numberof singly occupied partitions, less than a certain level of unoccupiedpartitions and less than a certain level of multiply occupiedpartitions.

The systems and methods described herein can be operated such that amajority of occupied partitions include no more than one cell peroccupied partition. In some cases, the partitioning process is conductedsuch that fewer than 25% of the occupied partitions contain more thanone cell, and in many cases, fewer than 20% of the occupied partitionshave more than one cell. In some cases, fewer than 10% or even fewerthan 5% of the occupied partitions include more than one cell perpartition.

In some cases, it is desirable to avoid the creation of excessivenumbers of empty partitions. For example, from a cost perspective and/orefficiency perspective, it may desirable to minimize the number of emptypartitions. While this may be accomplished by providing sufficientnumbers of cells into the partitioning zone, the Poissonian distributionmay expectedly increase the number of partitions that may includemultiple cells. As such, in accordance with aspects described herein,the flow of one or more of the cells, or other fluids directed into thepartitioning zone are conducted such that, in many cases, no more than50% of the generated partitions, no more than 25% of the generatedpartitions, or no more than 10% of the generated partitions areunoccupied. Further, in some aspects, these flows are controlled so asto present non-Poissonian distribution of single occupied partitionswhile providing lower levels of unoccupied partitions. Restated, in someaspects, the above noted ranges of unoccupied partitions can be achievedwhile still providing any of the single occupancy rates described above.For example, in many cases, the use of the systems and methods describedherein creates resulting partitions that have multiple occupancy ratesof less than 25%, less than 20%, less than 15%, less than 10%, and inmany cases, less than 5%, while having unoccupied partitions of lessthan 50%, less than 40%, less than 30%, less than 20%, less than 10%,and in some cases, less than 5%.

As will be appreciated, the above-described occupancy rates are alsoapplicable to partitions that include both cells and additionalreagents, including, but not limited to, microcapsules carrying barcodedoligonucleotides. In some aspects, a substantial percentage of theoverall occupied partitions can include both a microcapsule (e.g., bead)comprising barcoded oligonucleotides and a cell.

Although described in terms of providing substantially singly occupiedpartitions, above, in certain cases, it is desirable to provide multiplyoccupied partitions, e.g., containing two, three, four or more cellsand/or microcapsules (e.g., beads) comprising barcoded oligonucleotideswithin a single partition. Accordingly, as noted above, the flowcharacteristics of the cell and/or bead containing fluids andpartitioning fluids may be controlled to provide for such multiplyoccupied partitions. In particular, the flow parameters may becontrolled to provide a desired occupancy rate at greater than 50% ofthe partitions, greater than 75%, and in some cases greater than 80%,90%, 95%, or higher.

In some cases, additional microcapsules are used to deliver additionalreagents to a partition. In such cases, it may be advantageous tointroduce different beads into a common channel or droplet generationjunction, from different bead sources, i.e., containing differentassociated reagents, through different channel inlets into such commonchannel or droplet generation junction. In such cases, the flow andfrequency of the different beads into the channel or junction may becontrolled to provide for the desired ratio of microcapsules from eachsource, while ensuring the desired pairing or combination of such beadsinto a partition with the desired number of cells.

The partitions described herein may comprise small volumes, e.g., lessthan 10 μL, less than 54, less than 14, less than 900 picoliters (pL),less than 800 pL, less than 700 pL, less than 600 pL, less than 500 pL,less than 400 pL, less than 300 pL, less than 200 pL, less than 100 pL,less than 50 pL, less than 20 pL, less than 10 pL, less than 1 pL, lessthan 500 nanoliters (nL), or even less than 100 nL, 50 nL, or even less.

For example, in the case of droplet based partitions, the droplets mayhave overall volumes that are less than 1000 pL, less than 900 pL, lessthan 800 pL, less than 700 pL, less than 600 pL, less than 500 pL, lessthan 400 pL, less than 300 pL, less than 200 pL, less than 100 pL, lessthan 50 pL, less than 20 pL, less than 10 pL, or even less than 1 pL.Where co-partitioned with microcapsules, it will be appreciated that thesample fluid volume, e.g., including co-partitioned cells, within thepartitions may be less than 90% of the above described volumes, lessthan 80%, less than 70%, less than 60%, less than 50%, less than 40%,less than 30%, less than 20%, or even less than 10% the above describedvolumes.

As is described elsewhere herein, partitioning species may generate apopulation or plurality of partitions. In such cases, any suitablenumber of partitions can be generated to generate the plurality ofpartitions. For example, in a method described herein, a plurality ofpartitions may be generated that comprises at least about 1,000partitions, at least about 5,000 partitions, at least about 10,000partitions, at least about 50,000 partitions, at least about 100,000partitions, at least about 500,000 partitions, at least about 1,000,000partitions, at least about 5,000,000 partitions at least about10,000,000 partitions, at least about 50,000,000 partitions, at leastabout 100,000,000 partitions, at least about 500,000,000 partitions orat least about 1,000,000,000 partitions. Moreover, the plurality ofpartitions may comprise both unoccupied partitions (e.g., emptypartitions) and occupied partitions

Microfluidic channel networks can be utilized to generate partitions asdescribed herein. Alternative mechanisms may also be employed in thepartitioning of individual cells, including porous membranes throughwhich aqueous mixtures of cells are extruded into non-aqueous fluids.

An example of a simplified microfluidic channel structure forpartitioning individual cells is illustrated in FIG. 1. As describedelsewhere herein, in some cases, the majority of occupied partitionsinclude no more than one cell per occupied partition and, in some cases,some of the generated partitions are unoccupied. In some cases, though,some of the occupied partitions may include more than one cell. In somecases, the partitioning process may be controlled such that fewer than25% of the occupied partitions contain more than one cell, and in manycases, fewer than 20% of the occupied partitions have more than onecell, while in some cases, fewer than 10% or even fewer than 5% of theoccupied partitions include more than one cell per partition. As shown,the channel structure can include channel segments 102, 104, 106 and 108communicating at a channel junction 110. In operation, a first aqueousfluid 112 that includes suspended cells 114, may be transported alongchannel segment 102 into junction 110, while a second fluid 116 that isimmiscible with the aqueous fluid 112 is delivered to the junction 110from channel segments 104 and 106 to create discrete droplets 118 of theaqueous fluid including individual cells 114, flowing into channelsegment 108.

In some aspects, this second fluid 116 comprises an oil, such as afluorinated oil, that includes a fluorosurfactant for stabilizing theresulting droplets, e.g., inhibiting subsequent coalescence of theresulting droplets. Examples of particularly useful partitioning fluidsand fluorosurfactants are described for example, in U.S. PatentApplication Publication No. 20100105112, the full disclosure of which ishereby incorporated herein by reference in its entirety for allpurposes.

In other aspects, in addition to or as an alternative to droplet basedpartitioning, cells may be encapsulated within a microcapsule thatcomprises an outer shell or layer or porous matrix in which is entrainedone or more individual cells or small groups of cells, and may includeother reagents. Encapsulation of cells may be carried out by a varietyof processes. Such processes combine an aqueous fluid containing thecells to be analyzed with a polymeric precursor material that may becapable of being formed into a gel or other solid or semi-solid matrixupon application of a particular stimulus to the polymer precursor. Suchstimuli include, e.g., thermal stimuli (either heating or cooling),photo-stimuli (e.g., through photo-curing), chemical stimuli (e.g.,through crosslinking, polymerization initiation of the precursor (e.g.,through added initiators), or the like.

Preparation of microcapsules comprising cells may be carried out by avariety of methods. For example, air knife droplet or aerosol generatorsmay be used to dispense droplets of precursor fluids into gellingsolutions in order to form microcapsules that include individual cellsor small groups of cells. Likewise, membrane based encapsulation systemsmay be used to generate microcapsules comprising encapsulated cells asdescribed herein. In some aspects, microfluidic systems like that shownin FIG. 1 may be readily used in encapsulating cells as describedherein. In particular, and with reference to FIG. 1, the aqueous fluidcomprising the cells and the polymer precursor material is flowed intochannel junction 110, where it is partitioned into droplets 118comprising the individual cells 114, through the flow of non-aqueousfluid 116. In the case of encapsulation methods, non-aqueous fluid 116may also include an initiator to cause polymerization and/orcrosslinking of the polymer precursor to form the microcapsule thatincludes the entrained cells. Examples of polymer precursor/initiatorpairs include those described in U.S. Patent Application Publication No.20140378345, the full disclosure of which are hereby incorporated hereinby reference in their entireties for all purposes.

For example, in the case where the polymer precursor material comprisesa linear polymer material, e.g., a linear polyacrylamide, PEG, or otherlinear polymeric material, the activation agent may comprise across-linking agent, or a chemical that activates a cross-linking agentwithin the formed droplets. Likewise, for polymer precursors thatcomprise polymerizable monomers, the activation agent may comprise apolymerization initiator. For example, in certain cases, where thepolymer precursor comprises a mixture of acrylamide monomer with aN,N′-bis-(acryloyl)cystamine (BAC) comonomer, an agent such astetraethylmethylenediamine (TEMED) may be provided within the secondfluid streams in channel segments 104 and 106, which initiates thecopolymerization of the acrylamide and BAC into a cross-linked polymernetwork or, hydrogel.

Upon contact of the second fluid stream 116 with the first fluid stream112 at junction 110 in the formation of droplets, the TEMED may diffusefrom the second fluid 116 into the aqueous first fluid 112 comprisingthe linear polyacrylamide, which will activate the crosslinking of thepolyacrylamide within the droplets, resulting in the formation of thegel, e.g., hydrogel, microcapsules 118, as solid or semi-solid beads orparticles entraining the cells 114. Although described in terms ofpolyacrylamide encapsulation, other ‘activatable’ encapsulationcompositions may also be employed in the context of the methods andcompositions described herein. For example, formation of alginatedroplets followed by exposure to divalent metal ions, e.g., Ca2+, can beused as an encapsulation process using the described processes.Likewise, agarose droplets may also be transformed into capsules throughtemperature based gelling, e.g., upon cooling, or the like. In somecases, encapsulated cells can be selectively releasable from themicrocapsule, e.g., through passage of time, or upon application of aparticular stimulus, that degrades the microcapsule sufficiently toallow the cell, or its contents to be released from the microcapsule,e.g., into a partition, such as a droplet. For example, in the case ofthe polyacrylamide polymer described above, degradation of themicrocapsule may be accomplished through the introduction of anappropriate reducing agent, such as DTT or the like, to cleave disulfidebonds that cross link the polymer matrix (See, e.g., U.S. PatentApplication Publication No. 20140378345, the full disclosures of whichare hereby incorporated herein by reference in their entirety for allpurposes.

Encapsulated cells or cell populations provide certain potentialadvantages of being storable, and more portable than droplet basedpartitioned cells. Furthermore, in some cases, it may be desirable toallow cells to be analyzed to incubate for a select period of time, inorder to characterize changes in such cells over time, either in thepresence or absence of different stimuli. In such cases, encapsulationof individual cells may allow for longer incubation than partitioning inemulsion droplets, although in some cases, droplet partitioned cells mayalso be incubated for different periods of time, e.g., at least 10seconds, at least 30 seconds, at least 1 minute, at least 5 minutes, atleast 10 minutes, at least 30 minutes, at least 1 hour, at least 2hours, at least 5 hours, or at least 10 hours or more. The encapsulationof cells may constitute the partitioning of the cells into which otherreagents are co-partitioned. Alternatively, encapsulated cells may bereadily deposited into other partitions, e.g., droplets, as describedabove.

In accordance with certain aspects, the cells may be partitioned alongwith lysis reagents in order to release the contents of the cells withinthe partition. In such cases, the lysis agents can be contacted with thecell suspension concurrently with, or immediately prior to theintroduction of the cells into the partitioning junction/dropletgeneration zone, e.g., through an additional channel or channelsupstream of channel junction 110. Examples of lysis agents includebioactive reagents, such as lysis enzymes that are used for lysis ofdifferent cell types, e.g., gram positive or negative bacteria, plants,yeast, mammalian, etc., such as lysozymes, achromopeptidase,lysostaphin, labiase, kitalase, lyticase, and a variety of other lysisenzymes available from, e.g., Sigma-Aldrich, Inc. (St Louis, Mo.), aswell as other commercially available lysis enzymes. Other lysis agentsmay additionally or alternatively be co-partitioned with the cells tocause the release of the cell's contents into the partitions. Forexample, in some cases, surfactant based lysis solutions may be used tolyse cells, although these may be less desirable for emulsion basedsystems where the surfactants can interfere with stable emulsions. Insome cases, lysis solutions may include non-ionic surfactants such as,for example, TritonX-100 and Tween 20. In some cases, lysis solutionsmay include ionic surfactants such as, for example, sarcosyl and sodiumdodecyl sulfate (SDS). Electroporation, thermal, acoustic or mechanicalcellular disruption may also be used in certain cases, e.g.,non-emulsion based partitioning such as encapsulation of cells that maybe in addition to or in place of droplet partitioning, where any poresize of the encapsulate is sufficiently small to retain nucleic acidfragments of a desired size, following cellular disruption.

In addition to the lysis agents co-partitioned with the cells describedabove, other reagents can also be co-partitioned with the cells,including, for example, DNase and RNase inactivating agents orinhibitors, such as proteinase K, chelating agents, such as EDTA, andother reagents employed in removing or otherwise reducing negativeactivity or impact of different cell lysate components on subsequentprocessing of nucleic acids. In addition, in the case of encapsulatedcells, the cells may be exposed to an appropriate stimulus to releasethe cells or their contents from a co-partitioned microcapsule. Forexample, in some cases, a chemical stimulus may be co-partitioned alongwith an encapsulated cell to allow for the degradation of themicrocapsule and release of the cell or its contents into the largerpartition. In some cases, this stimulus may be the same as the stimulusdescribed elsewhere herein for release of oligonucleotides from theirrespective microcapsule (e.g., bead). In alternative aspects, this maybe a different and non-overlapping stimulus, in order to allow anencapsulated cell to be released into a partition at a different timefrom the release of oligonucleotides into the same partition.

Additional reagents may also be co-partitioned with the cells, such asendonucleases to fragment the cell's DNA, DNA polymerase enzymes anddNTPs used to amplify the cell's nucleic acid fragments and to attachthe barcode oligonucleotides to the amplified fragments. Additionalreagents may also include reverse transcriptase enzymes, includingenzymes with terminal transferase activity, primers andoligonucleotides, and switch oligonucleotides (also referred to hereinas “switch oligos” or “template switching oligonucleotides”) which canbe used for template switching. In some cases, template switching can beused to increase the length of a cDNA. In some cases, template switchingcan be used to append a predefined nucleic acid sequence to the cDNA. Inone example of template switching, cDNA can be generated from reversetranscription of a template, e.g., cellular mRNA, where a reversetranscriptase with terminal transferase activity can add additionalnucleotides, e.g., polyC, to the cDNA in a template independent manner.Switch oligos can include sequences complementary to the additionalnucleotides, e.g., polyG. The additional nucleotides (e.g., polyC) onthe cDNA can hybridize to the additional nucleotides (e.g., polyG) onthe switch oligo, whereby the switch oligo can be used by the reversetranscriptase as template to further extend the cDNA. Template switchingoligonucleotides may comprise a hybridization region and a templateregion. The hybridization region can comprise any sequence capable ofhybridizing to the target. In some cases, as previously described, thehybridization region comprises a series of G bases to complement theoverhanging C bases at the 3′ end of a cDNA molecule. The series of Gbases may comprise 1 G base, 2 G bases, 3 G bases, 4 G bases, 5 G basesor more than 5 G bases. The template sequence can comprise any sequenceto be incorporated into the cDNA. In some cases, the template regioncomprises at least 1 (e.g., at least 2, 3, 4, 5 or more) tag sequencesand/or functional sequences. Switch oligos may comprise deoxyribonucleicacids; ribonucleic acids; modified nucleic acids including2-Aminopurine, 2,6-Diaminopurine (2-Amino-dA), inverted dT, 5-Methyl dC,2′-deoxylnosine, Super T (5-hydroxybutynl-2′-deoxyuridine), Super G(8-aza-7-deazaguanosine), locked nucleic acids (LNAs), unlocked nucleicacids (UNAs, e.g., UNA-A, UNA-U, UNA-C, UNA-G), Iso-dG, Iso-dC, 2′Fluoro bases (e.g., Fluoro C, Fluoro U, Fluoro A, and Fluoro G), or anycombination.

In some cases, the length of a switch oligo may be 2, 3, 4, 5, 6, 7, 8,9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62,63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112,113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126,127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140,141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154,155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182,183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196,197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210,211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238,239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250 nucleotidesor longer.

In some cases, the length of a switch oligo may be at least 2, 3, 4, 5,6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78,79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96,97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111,112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125,126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139,140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153,154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167,168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181,182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195,196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223,224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237,238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249 or 250nucleotides or longer.

In some cases, the length of a switch oligo may be at most 2, 3, 4, 5,6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78,79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96,97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111,112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125,126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139,140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153,154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167,168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181,182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195,196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223,224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237,238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249 or 250nucleotides.

Once the contents of the cells are released into their respectivepartitions, the nucleic acids contained therein may be further processedwithin the partitions. In accordance with the methods and systemsdescribed herein, the nucleic acid contents of individual cells can beprovided with unique identifiers such that, upon characterization ofthose nucleic acids they may be attributed as having been derived fromthe same cell or cells. The ability to attribute characteristics toindividual cells or groups of cells is provided by the assignment ofunique identifiers specifically to an individual cell or groups ofcells. Unique identifiers, e.g., in the form of nucleic acid barcodescan be assigned or associated with individual cells or populations ofcells, in order to tag or label the cell's components (and as a result,its characteristics) with the unique identifiers. These uniqueidentifiers can then be used to attribute the cell's components andcharacteristics to an individual cell or group of cells. In someaspects, this is carried out by co-partitioning the individual cells orgroups of cells with the unique identifiers. In some aspects, the uniqueidentifiers are provided in the form of oligonucleotides that comprisenucleic acid barcode sequences that may be attached to or otherwiseassociated with the nucleic acid contents of individual cells, or toother components of the cells, and particularly to fragments of thosenucleic acids. The oligonucleotides are partitioned such that as betweenoligonucleotides in a given partition, the nucleic acid barcodesequences contained therein are the same, but as between differentpartitions, the oligonucleotides can, and do have differing barcodesequences, or at least represent a large number of different barcodesequences across all of the partitions in a given analysis. In someaspects, only one nucleic acid barcode sequence can be associated with agiven partition, although in some cases, two or more different barcodesequences may be present.

The nucleic acid barcode sequences can include from 6 to about 20 ormore nucleotides within the sequence of the oligonucleotides. In somecases, the length of a barcode sequence may be 6, 7, 8, 9, 10, 11, 12,13, 14, 15, 16, 17, 18, 19, 20 nucleotides or longer. In some cases, thelength of a barcode sequence may be at least 6, 7, 8, 9, 10, 11, 12, 13,14, 15, 16, 17, 18, 19, 20 nucleotides or longer. In some cases, thelength of a barcode sequence may be at most 6, 7, 8, 9, 10, 11, 12, 13,14, 15, 16, 17, 18, 19, 20 nucleotides or shorter. These nucleotides maybe completely contiguous, i.e., in a single stretch of adjacentnucleotides, or they may be separated into two or more separatesubsequences that are separated by 1 or more nucleotides. In some cases,separated barcode subsequences can be from about 4 to about 16nucleotides in length. In some cases, the barcode subsequence may be 4,5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 nucleotides or longer. In somecases, the barcode subsequence may be at least 4, 5, 6, 7, 8, 9, 10, 11,12, 13, 14, 15, 16 nucleotides or longer. In some cases, the barcodesubsequence may be at most 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16nucleotides or shorter.

The co-partitioned oligonucleotides can also comprise other functionalsequences useful in the processing of the nucleic acids from theco-partitioned cells. These sequences include, e.g., targeted orrandom/universal amplification primer sequences for amplifying thegenomic DNA from the individual cells within the partitions whileattaching the associated barcode sequences, sequencing primers or primerrecognition sites, hybridization or probing sequences, e.g., foridentification of presence of the sequences or for pulling down barcodednucleic acids, or any of a number of other potential functionalsequences. Other mechanisms of co-partitioning oligonucleotides may alsobe employed, including, e.g., coalescence of two or more droplets, whereone droplet contains oligonucleotides, or microdispensing ofoligonucleotides into partitions, e.g., droplets within microfluidicsystems.

Briefly, in one example, microcapsules, such as beads, are provided thateach include large numbers of the above described barcodedoligonucleotides releasably attached to the beads, where all of theoligonucleotides attached to a particular bead will include the samenucleic acid barcode sequence, but where a large number of diversebarcode sequences are represented across the population of beads used.In some embodiments, hydrogel beads, e.g., comprising polyacrylamidepolymer matrices, are used as a solid support and delivery vehicle forthe oligonucleotides into the partitions, as they are capable ofcarrying large numbers of oligonucleotide molecules, and may beconfigured to release those oligonucleotides upon exposure to aparticular stimulus, as described elsewhere herein. In some cases, thepopulation of beads will provide a diverse barcode sequence library thatincludes at least 1,000 different barcode sequences, at least 5,000different barcode sequences, at least 10,000 different barcodesequences, at least at least 50,000 different barcode sequences, atleast 100,000 different barcode sequences, at least 1,000,000 differentbarcode sequences, at least 5,000,000 different barcode sequences, or atleast 10,000,000 different barcode sequences. Additionally, each beadcan be provided with large numbers of oligonucleotide moleculesattached. In particular, the number of molecules of oligonucleotidesincluding the barcode sequence on an individual bead can be at least1,000 oligonucleotide molecules, at least 5,000 oligonucleotidemolecules, at least 10,000 oligonucleotide molecules, at least 50,000oligonucleotide molecules, at least 100,000 oligonucleotide molecules,at least 500,000 oligonucleotides, at least 1,000,000 oligonucleotidemolecules, at least 5,000,000 oligonucleotide molecules, at least10,000,000 oligonucleotide molecules, at least 50,000,000oligonucleotide molecules, at least 100,000,000 oligonucleotidemolecules, and in some cases at least 1 billion oligonucleotidemolecules.

Moreover, when the population of beads is partitioned, the resultingpopulation of partitions can also include a diverse barcode library thatincludes at least 1,000 different barcode sequences, at least 5,000different barcode sequences, at least 10,000 different barcodesequences, at least at least 50,000 different barcode sequences, atleast 100,000 different barcode sequences, at least 1,000,000 differentbarcode sequences, at least 5,000,000 different barcode sequences, or atleast 10,000,000 different barcode sequences. Additionally, eachpartition of the population can include at least 1,000 oligonucleotidemolecules, at least 5,000 oligonucleotide molecules, at least 10,000oligonucleotide molecules, at least 50,000 oligonucleotide molecules, atleast 100,000 oligonucleotide molecules, at least 500,000oligonucleotides, at least 1,000,000 oligonucleotide molecules, at least5,000,000 oligonucleotide molecules, at least 10,000,000 oligonucleotidemolecules, at least 50,000,000 oligonucleotide molecules, at least100,000,000 oligonucleotide molecules, and in some cases at least 1billion oligonucleotide molecules.

In some cases, it may be desirable to incorporate multiple differentbarcodes within a given partition, either attached to a single ormultiple beads within the partition. For example, in some cases, amixed, but known barcode sequences set may provide greater assurance ofidentification in the subsequent processing, e.g., by providing astronger address or attribution of the barcodes to a given partition, asa duplicate or independent confirmation of the output from a givenpartition.

The oligonucleotides are releasable from the beads upon the applicationof a particular stimulus to the beads. In some cases, the stimulus maybe a photo-stimulus, e.g., through cleavage of a photo-labile linkagethat releases the oligonucleotides. In other cases, a thermal stimulusmay be used, where elevation of the temperature of the beads environmentwill result in cleavage of a linkage or other release of theoligonucleotides form the beads. In still other cases, a chemicalstimulus is used that cleaves a linkage of the oligonucleotides to thebeads, or otherwise results in release of the oligonucleotides from thebeads. In one case, such compositions include the polyacrylamidematrices described above for encapsulation of cells, and may be degradedfor release of the attached oligonucleotides through exposure to areducing agent, such as DTT.

In accordance with the methods and systems described herein, the beadsincluding the attached oligonucleotides are co-partitioned with theindividual cells, such that a single bead and a single cell arecontained within an individual partition. As noted above, while singlecell/single bead occupancy is the most desired state, it will beappreciated that multiply occupied partitions (either in terms of cells,beads or both), or unoccupied partitions (either in terms of cells,beads or both) will often be present. An example of a microfluidicchannel structure for co-partitioning cells and beads comprising barcodeoligonucleotides is schematically illustrated in FIG. 2. As describedelsewhere herein, in some aspects, a substantial percentage of theoverall occupied partitions will include both a bead and a cell and, insome cases, some of the partitions that are generated will beunoccupied. In some cases, some of the partitions may have beads andcells that are not partitioned 1:1. In some cases, it may be desirableto provide multiply occupied partitions, e.g., containing two, three,four or more cells and/or beads within a single partition. As shown,channel segments 202, 204, 206, 208 and 210 are provided in fluidcommunication at channel junction 212. An aqueous stream comprising theindividual cells 214, is flowed through channel segment 202 towardchannel junction 212. As described above, these cells may be suspendedwithin an aqueous fluid, or may have been pre-encapsulated, prior to thepartitioning process.

Concurrently, an aqueous stream comprising the barcode carrying beads216, is flowed through channel segment 204 toward channel junction 212.A non-aqueous partitioning fluid 216 is introduced into channel junction212 from each of side channels 206 and 208, and the combined streams areflowed into outlet channel 210. Within channel junction 212, the twocombined aqueous streams from channel segments 202 and 204 are combined,and partitioned into droplets 218, that include co-partitioned cells 214and beads 216. As noted previously, by controlling the flowcharacteristics of each of the fluids combining at channel junction 212,as well as controlling the geometry of the channel junction,partitioning can be optimized to achieve a desired occupancy level ofbeads, cells or both, within the partitions 218 that are generated.

In some cases, lysis agents, e.g., cell lysis enzymes, may be introducedinto the partition with the bead stream, e.g., flowing through channelsegment 204, such that lysis of the cell only commences at or after thetime of partitioning. Additional reagents may also be added to thepartition in this configuration, such as endonucleases to fragment thecell's DNA, DNA polymerase enzyme and dNTPs used to amplify the cell'snucleic acid fragments and to attach the barcode oligonucleotides to theamplified fragments. As noted above, in many cases, a chemical stimulus,such as DTT, may be used to release the barcodes from their respectivebeads into the partition. In such cases, it may be particularlydesirable to provide the chemical stimulus along with thecell-containing stream in channel segment 202, such that release of thebarcodes only occurs after the two streams have been combined, e.g.,within the partitions 218. Where the cells are encapsulated, however,introduction of a common chemical stimulus, e.g., that both releases theoligonucleotides form their beads, and releases cells from theirmicrocapsules may generally be provided from a separate additional sidechannel (not shown) upstream of or connected to channel junction 212.

A number of other reagents may be co-partitioned along with the cells,beads, lysis agents and chemical stimuli, including, for example,protective reagents, like proteinase K, chelators, nucleic acidextension, replication, transcription or amplification reagents such aspolymerases, reverse transcriptases, transposases which can be used fortransposon based methods (e.g., Nextera), nucleoside triphosphates orNTP analogues, primer sequences and additional cofactors such asdivalent metal ions used in such reactions, ligation reaction reagents,such as ligase enzymes and ligation sequences, dyes, labels, or othertagging reagents.

The channel networks, e.g., as described herein, can be fluidly coupledto appropriate fluidic components. For example, the inlet channelsegments, e.g., channel segments 202, 204, 206 and 208 are fluidlycoupled to appropriate sources of the materials they are to deliver tochannel junction 212. For example, channel segment 202 will be fluidlycoupled to a source of an aqueous suspension of cells 214 to beanalyzed, while channel segment 204 may be fluidly coupled to a sourceof an aqueous suspension of beads 216. Channel segments 206 and 208 maythen be fluidly connected to one or more sources of the non-aqueousfluid. These sources may include any of a variety of different fluidiccomponents, from simple reservoirs defined in or connected to a bodystructure of a microfluidic device, to fluid conduits that deliverfluids from off-device sources, manifolds, or the like. Likewise, theoutlet channel segment 210 may be fluidly coupled to a receiving vesselor conduit for the partitioned cells. Again, this may be a reservoirdefined in the body of a microfluidic device, or it may be a fluidicconduit for delivering the partitioned cells to a subsequent processoperation, instrument or component.

FIG. 8 shows images of individual Jurkat cells co-partitioned along withbarcode oligonucleotide containing beads in aqueous droplets in anaqueous in oil emulsion. As illustrated, individual cells may be readilyco-partitioned with individual beads. As will be appreciated,optimization of individual cell loading may be carried out by a numberof methods, including by providing dilutions of cell populations intothe microfluidic system in order to achieve the desired cell loading perpartition as described elsewhere herein.

In operation, once lysed, the nucleic acid contents of the individualcells are then available for further processing within the partitions,including, e.g., fragmentation, amplification and barcoding, as well asattachment of other functional sequences. As noted above, fragmentationmay be accomplished through the co-partitioning of shearing enzymes,such as endonucleases, in order to fragment the nucleic acids intosmaller fragments. These endonucleases may include restrictionendonucleases, including type II and type IIs restriction endonucleasesas well as other nucleic acid cleaving enzymes, such as nickingendonucleases, and the like. In some cases, fragmentation may not bedesired, and full length nucleic acids may be retained within thepartitions, or in the case of encapsulated cells or cell contents,fragmentation may be carried out prior to partitioning, e.g., throughenzymatic methods, e.g., those described herein, or through mechanicalmethods, e.g., mechanical, acoustic or other shearing.

Once co-partitioned, and the cells are lysed to release their nucleicacids, the oligonucleotides disposed upon the bead may be used tobarcode and amplify fragments of those nucleic acids. Briefly, in oneaspect, the oligonucleotides present on the beads that areco-partitioned with the cells, are released from their beads into thepartition with the cell's nucleic acids. The oligonucleotides caninclude, along with the barcode sequence, a primer sequence at its 5′end. This primer sequence may be a random oligonucleotide sequenceintended to randomly prime numerous different regions on the cell'snucleic acids, or it may be a specific primer sequence targeted to primeupstream of a specific targeted region of the cell's genome.

Once released, the primer portion of the oligonucleotide can anneal to acomplementary region of the cell's nucleic acid. Extension reactionreagents, e.g., DNA polymerase, nucleoside triphosphates, co-factors(e.g., Mg2+ or Mn2+), that are also co-partitioned with the cells andbeads, then extend the primer sequence using the cell's nucleic acid asa template, to produce a complementary fragment to the strand of thecell's nucleic acid to which the primer annealed, which complementaryfragment includes the oligonucleotide and its associated barcodesequence. Annealing and extension of multiple primers to differentportions of the cell's nucleic acids will result in a large pool ofoverlapping complementary fragments of the nucleic acid, each possessingits own barcode sequence indicative of the partition in which it wascreated. In some cases, these complementary fragments may themselves beused as a template primed by the oligonucleotides present in thepartition to produce a complement of the complement that again, includesthe barcode sequence. In some cases, this replication process isconfigured such that when the first complement is duplicated, itproduces two complementary sequences at or near its termini, to allowformation of a hairpin structure or partial hairpin structure, thereduces the ability of the molecule to be the basis for producingfurther iterative copies. As described herein, the cell's nucleic acidsmay include any desired nucleic acids within the cell including, forexample, the cell's DNA, e.g., genomic DNA, RNA, e.g., messenger RNA,and the like. For example, in some cases, the methods and systemsdescribed herein are used in characterizing expressed mRNA, including,e.g., the presence and quantification of such mRNA, and may include RNAsequencing processes as the characterization process. Alternatively oradditionally, the reagents partitioned along with the cells may includereagents for the conversion of mRNA into cDNA, e.g., reversetranscriptase enzymes and reagents, to facilitate sequencing processeswhere DNA sequencing is employed. In some cases, where the nucleic acidsto be characterized comprise RNA, e.g., mRNA, schematic illustration ofone example of this is shown in FIG. 3.

As shown, oligonucleotides that include a barcode sequence areco-partitioned in, e.g., a droplet 302 in an emulsion, along with asample nucleic acid 304. As noted elsewhere herein, the oligonucleotides308 may be provided on a bead 306 that is co-partitioned with the samplenucleic acid 304, which oligonucleotides are releasable from the bead306, as shown in panel A. The oligonucleotides 308 include a barcodesequence 312, in addition to one or more functional sequences, e.g.,sequences 310, 314 and 316. For example, oligonucleotide 308 is shown ascomprising barcode sequence 312, as well as sequence 310 that mayfunction as an attachment or immobilization sequence for a givensequencing system, e.g., a P5 sequence used for attachment in flow cellsof an Illumina Hiseq® or Miseq® system. As shown, the oligonucleotidesalso include a primer sequence 316, which may include a random ortargeted N-mer for priming replication of portions of the sample nucleicacid 304. Also included within oligonucleotide 308 is a sequence 314which may provide a sequencing priming region, such as a “read1” or R1priming region, that is used to prime polymerase mediated, templatedirected sequencing by synthesis reactions in sequencing systems. Aswill be appreciated, the functional sequences may be selected to becompatible with a variety of different sequencing systems, e.g., 454Sequencing, Ion Torrent Proton or PGM, Illumina X10, etc., and therequirements thereof. In many cases, the barcode sequence 312,immobilization sequence 310 and R1 sequence 314 may be common to all ofthe oligonucleotides attached to a given bead. The primer sequence 316may vary for random N-mer primers, or may be common to theoligonucleotides on a given bead for certain targeted applications.

As will be appreciated, in some cases, the functional sequences mayinclude primer sequences useful for RNA-seq applications. For example,in some cases, the oligonucleotides may include poly-T primers forpriming reverse transcription of RNA for RNA-seq. In still other cases,oligonucleotides in a given partition, e.g., included on an individualbead, may include multiple types of primer sequences in addition to thecommon barcode sequences, such as both DNA-sequencing and RNA sequencingprimers, e.g., poly-T primer sequences included within theoligonucleotides coupled to the bead. In such cases, a singlepartitioned cell may be both subjected to DNA and RNA sequencingprocesses.

Based upon the presence of primer sequence 316, the oligonucleotides canprime the sample nucleic acid as shown in panel B, which allows forextension of the oligonucleotides 308 and 308 a using polymerase enzymesand other extension reagents also co-partitioned with the bead 306 andsample nucleic acid 304. As shown in panel C, following extension of theoligonucleotides that, for random N-mer primers, may anneal to multipledifferent regions of the sample nucleic acid 304; multiple overlappingcomplements or fragments of the nucleic acid are created, e.g.,fragments 318 and 320. Although including sequence portions that arecomplementary to portions of sample nucleic acid, e.g., sequences 322and 324, these constructs are generally referred to herein as comprisingfragments of the sample nucleic acid 304, having the attached barcodesequences.

The barcoded nucleic acid fragments may then be subjected tocharacterization, e.g., through sequence analysis, or they may befurther amplified in the process, as shown in panel D. For example,additional oligonucleotides, e.g., oligonucleotide 308 b, also releasedfrom bead 306, may prime the fragments 318 and 320. This shown forfragment 318. In particular, again, based upon the presence of therandom N-mer primer 316 b in oligonucleotide 308 b (which in many casescan be different from other random N-mers in a given partition, e.g.,primer sequence 316), the oligonucleotide anneals with the fragment 318,and is extended to create a complement 326 to at least a portion offragment 318 which includes sequence 328, that comprises a duplicate ofa portion of the sample nucleic acid sequence. Extension of theoligonucleotide 308 b continues until it has replicated through theoligonucleotide portion 308 of fragment 318. As noted elsewhere herein,and as illustrated in panel D, the oligonucleotides may be configured toprompt a stop in the replication by the polymerase at a desired point,e.g., after replicating through sequences 316 and 314 of oligonucleotide308 that is included within fragment 318. As described herein, this maybe accomplished by different methods, including, for example, theincorporation of different nucleotides and/or nucleotide analogues thatare not capable of being processed by the polymerase enzyme used. Forexample, this may include the inclusion of uracil containing nucleotideswithin the sequence region 312 to prevent a non-uracil tolerantpolymerase to cease replication of that region. As a result a fragment326 is created that includes the full-length oligonucleotide 308 b atone end, including the barcode sequence 312, the attachment sequence310, the R1 primer region 314, and the random N-mer sequence 316 b. Atthe other end of the sequence may be included the complement 316′ to therandom N-mer of the first oligonucleotide 308, as well as a complementto all or a portion of the R1 sequence, shown as sequence 314′. The R1sequence 314 and its complement 314′ are then able to hybridize togetherto form a partial hairpin structure 328. As will be appreciated becausethe random N-mers differ among different oligonucleotides, thesesequences and their complements may not be expected to participate inhairpin formation, e.g., sequence 316′, which is the complement torandom N-mer 316, may not be expected to be complementary to randomN-mer sequence 316 b. This may not be the case for other applications,e.g., targeted primers, where the N-mers may be common amongoligonucleotides within a given partition.

By forming these partial hairpin structures, it allows for the removalof first level duplicates of the sample sequence from furtherreplication, e.g., preventing iterative copying of copies. The partialhairpin structure also provides a useful structure for subsequentprocessing of the created fragments, e.g., fragment 326.

In general, the amplification of the cell's nucleic acids is carried outuntil the barcoded overlapping fragments within the partition constituteat least 1× coverage of the particular portion or all of the cell'sgenome, at least 2×, at least 3×, at least 4×, at least 5×, at least10×, at least 20×, at least 40× or more coverage of the genome or itsrelevant portion of interest. Once the barcoded fragments are produced,they may be directly sequenced on an appropriate sequencing system,e.g., an Illumina Hiseq®, Miseq® or X10 system, or they may be subjectedto additional processing, such as further amplification, attachment ofother functional sequences, e.g., second sequencing primers, for reversereads, sample index sequences, and the like.

All of the fragments from multiple different partitions may then bepooled for sequencing on high throughput sequencers as described herein,where the pooled fragments comprise a large number of fragments derivedfrom the nucleic acids of different cells or small cell populations, butwhere the fragments from the nucleic acids of a given cell will sharethe same barcode sequence. In particular, because each fragment is codedas to its partition of origin, and consequently its single cell or smallpopulation of cells, the sequence of that fragment may be attributedback to that cell or those cells based upon the presence of the barcode,which will also aid in applying the various sequence fragments frommultiple partitions to assembly of individual genomes for differentcells. This is schematically illustrated in FIG. 4. As shown in oneexample, a first nucleic acid 404 from a first cell 400, and a secondnucleic acid 406 from a second cell 402 are each partitioned along withtheir own sets of barcode oligonucleotides as described above. Thenucleic acids may comprise a chromosome, entire genome or other largenucleic acid from the cells.

Within each partition, each cell's nucleic acids 404 and 406 is thenprocessed to separately provide overlapping set of second fragments ofthe first fragment(s), e.g., second fragment sets 408 and 410. Thisprocessing also provides the second fragments with a barcode sequencethat is the same for each of the second fragments derived from aparticular first fragment. As shown, the barcode sequence for secondfragment set 408 is denoted by “1” while the barcode sequence forfragment set 410 is denoted by “2”. A diverse library of barcodes may beused to differentially barcode large numbers of different fragment sets.However, it is not necessary for every second fragment set from adifferent first fragment to be barcoded with different barcodesequences. In fact, in many cases, multiple different first fragmentsmay be processed concurrently to include the same barcode sequence.Diverse barcode libraries are described in detail elsewhere herein.

The barcoded fragments, e.g., from fragment sets 408 and 410, may thenbe pooled for sequencing using, for example, sequence by synthesistechnologies available from Illumina or Ion Torrent division ofThermo-Fisher, Inc. Once sequenced, the sequence reads 412 can beattributed to their respective fragment set, e.g., as shown inaggregated reads 414 and 416, at least in part based upon the includedbarcodes, and in some cases, in part based upon the sequence of thefragment itself. The attributed sequence reads for each fragment set arethen assembled to provide the assembled sequence for each cell's nucleicacids, e.g., sequences 418 and 420, which in turn, may be attributed toindividual cells, e.g., cells 400 and 402.

While described in terms of analyzing the genetic material presentwithin cells, the methods and systems described herein may have muchbroader applicability, including the ability to characterize otheraspects of individual cells or cell populations, by allowing for theallocation of reagents to individual cells, and providing for theattributable analysis or characterization of those cells in response tothose reagents. These methods and systems are particularly valuable inbeing able to characterize cells for, e.g., research, diagnostic,pathogen identification, and many other purposes. By way of example, awide range of different cell surface features, e.g., cell surfaceproteins like cluster of differentiation or CD proteins, havesignificant diagnostic relevance in characterization of diseases likecancer.

In one particularly useful application, the methods and systemsdescribed herein may be used to characterize cell features, such as cellsurface features, e.g., proteins, receptors, etc. In particular, themethods described herein may be used to attach reporter molecules tothese cell features, that when partitioned as described above, may bebarcoded and analyzed, e.g., using DNA sequencing technologies, toascertain the presence, and in some cases, relative abundance orquantity of such cell features within an individual cell or populationof cells.

In a particular example, a library of potential cell binding ligands,e.g., antibodies, antibody fragments, cell surface receptor bindingmolecules, or the like, maybe provided associated with a first set ofnucleic acid reporter molecules, e.g., where a different reporteroligonucleotide sequence is associated with a specific ligand, andtherefore capable of binding to a specific cell surface feature. In someaspects, different members of the library may be characterized by thepresence of a different oligonucleotide sequence label, e.g., anantibody to a first type of cell surface protein or receptor may haveassociated with it a first known reporter oligonucleotide sequence,while an antibody to a second receptor protein may have a differentknown reporter oligonucleotide sequence associated with it. Prior toco-partitioning, the cells may be incubated with the library of ligands,that may represent antibodies to a broad panel of different cell surfacefeatures, e.g., receptors, proteins, etc., and which include theirassociated reporter oligonucleotides. Unbound ligands are washed fromthe cells, and the cells are then co-partitioned along with the barcodeoligonucleotides described above. As a result, the partitions willinclude the cell or cells, as well as the bound ligands and their known,associated reporter oligonucleotides.

Without the need for lysing the cells within the partitions, one maythen subject the reporter oligonucleotides to the barcoding operationsdescribed above for cellular nucleic acids, to produce barcoded,reporter oligonucleotides, where the presence of the reporteroligonucleotides can be indicative of the presence of the particularcell surface feature, and the barcode sequence will allow theattribution of the range of different cell surface features to a givenindividual cell or population of cells based upon the barcode sequencethat was co-partitioned with that cell or population of cells. As aresult, one may generate a cell-by-cell profile of the cell surfacefeatures within a broader population of cells. This aspect of themethods and systems described herein, is described in greater detailbelow.

This example is schematically illustrated in FIG. 5. As shown, apopulation of cells, represented by cells 502 and 504 are incubated witha library of cell surface associated reagents, e.g., antibodies, cellsurface binding proteins, ligands or the like, where each different typeof binding group includes an associated nucleic acid reporter moleculeassociated with it, shown as ligands and associated reporter molecules506, 508, 510 and 512 (with the reporter molecules being indicated bythe differently shaded circles). Where the cell expresses the surfacefeatures that are bound by the library, the ligands and their associatedreporter molecules can become associated or coupled with the cellsurface. Individual cells are then partitioned into separate partitions,e.g., droplets 514 and 516, along with their associated ligand/reportermolecules, as well as an individual barcode oligonucleotide bead asdescribed elsewhere herein, e.g., beads 522 and 524, respectively. Aswith other examples described herein, the barcoded oligonucleotides arereleased from the beads and used to attach the barcode sequence thereporter molecules present within each partition with a barcode that iscommon to a given partition, but which varies widely among differentpartitions. For example, as shown in FIG. 5, the reporter molecules thatassociate with cell 502 in partition 514 are barcoded with barcodesequence 518, while the reporter molecules associated with cell 504 inpartition 516 are barcoded with barcode 520. As a result, one isprovided with a library of oligonucleotides that reflects the surfaceligands of the cell, as reflected by the reporter molecule, but which issubstantially attributable to an individual cell by virtue of a commonbarcode sequence, allowing a single cell level profiling of the surfacecharacteristics of the cell. As will be appreciated, this process is notlimited to cell surface receptors but may be used to identify thepresence of a wide variety of specific cell structures, chemistries orother characteristics.

Single cell processing and analysis methods and systems described hereincan be utilized for a wide variety of applications, including analysisof specific individual cells, analysis of different cell types withinpopulations of differing cell types, analysis and characterization oflarge populations of cells for environmental, human health,epidemiological forensic, or any of a wide variety of differentapplications.

A particularly valuable application of the single cell analysisprocesses described herein is in the sequencing and characterization ofa diseased cell. A diseased cell can have altered metabolic properties,gene expression, and/or morphologic features. Examples of diseasesinclude inflammatory disorders, metabolic disorders, nervous systemdisorders, and cancer.

Of particular interest are cancer cells. In particular, conventionalanalytical techniques, including the ensemble sequencing processesalluded to above, are not highly adept at picking small variations ingenomic make-up of cancer cells, particularly where those exist in a seaof normal tissue cells. Further, even as between tumor cells, widevariations can exist and can be masked by the ensemble approaches tosequencing (See, e.g., Patel, et al., Single-cell RNA-seq highlightsintratumoral heterogeneity in primary glioblastoma, Science DOI:10.1126/science.1254257 (Published online Jun. 12, 2014). Cancer cellsmay be derived from solid tumors, hematological malignancies, celllines, or obtained as circulating tumor cells, and subjected to thepartitioning processes described above. Upon analysis, one can identifyindividual cell sequences as deriving from a single cell or small groupof cells, and distinguish those over normal tissue cell sequences.

Non-limiting examples of cancer cells include cells of cancers such asAcanthoma, Acinic cell carcinoma, Acoustic neuroma, Acral lentiginousmelanoma, Acrospiroma, Acute eosinophilic leukemia, Acute lymphoblasticleukemia, Acute megakaryoblastic leukemia, Acute monocytic leukemia,Acute myeloblastic leukemia with maturation, Acute myeloid dendriticcell leukemia, Acute myeloid leukemia, Acute promyelocytic leukemia,Adamantinoma, Adenocarcinoma, Adenoid cystic carcinoma, Adenoma,Adenomatoid odontogenic tumor, Adrenocortical carcinoma, Adult T-cellleukemia, Aggressive NK-cell leukemia, AIDS-Related Cancers,AIDS-related lymphoma, Alveolar soft part sarcoma, Ameloblastic fibroma,Anal cancer, Anaplastic large cell lymphoma, Anaplastic thyroid cancer,Angioimmunoblastic T-cell lymphoma, Angiomyolipoma, Angiosarcoma,Appendix cancer, Astrocytoma, Atypical teratoid rhabdoid tumor, Basalcell carcinoma, Basal-like carcinoma, B-cell leukemia, B-cell lymphoma,Bellini duct carcinoma, Biliary tract cancer, Bladder cancer, Blastoma,Bone Cancer, Bone tumor, Brain Stem Glioma, Brain Tumor, Breast Cancer,Brenner tumor, Bronchial Tumor, Bronchioloalveolar carcinoma, Browntumor, Burkitt's lymphoma, Cancer of Unknown Primary Site, CarcinoidTumor, Carcinoma, Carcinoma in situ, Carcinoma of the penis, Carcinomaof Unknown Primary Site, Carcinosarcoma, Castleman's Disease, CentralNervous System Embryonal Tumor, Cerebellar Astrocytoma, CerebralAstrocytoma, Cervical Cancer, Cholangiocarcinoma, Chondroma,Chondrosarcoma, Chordoma, Choriocarcinoma, Choroid plexus papilloma,Chronic Lymphocytic Leukemia, Chronic monocytic leukemia, Chronicmyelogenous leukemia, Chronic Myeloproliferative Disorder, Chronicneutrophilic leukemia, Clear-cell tumor, Colon Cancer, Colorectalcancer, Craniopharyngioma, Cutaneous T-cell lymphoma, Degos disease,Dermatofibrosarcoma protuberans, Dermoid cyst, Desmoplastic small roundcell tumor, Diffuse large B cell lymphoma, Dysembryoplasticneuroepithelial tumor, Embryonal carcinoma, Endodermal sinus tumor,Endometrial cancer, Endometrial Uterine Cancer, Endometrioid tumor,Enteropathy-associated T-cell lymphoma, Ependymoblastoma, Ependymoma,Epithelioid sarcoma, Erythroleukemia, Esophageal cancer,Esthesioneuroblastoma, Ewing Family of Tumor, Ewing Family Sarcoma,Ewing's sarcoma, Extracranial Germ Cell Tumor, Extragonadal Germ CellTumor, Extrahepatic Bile Duct Cancer, Extramammary Paget's disease,Fallopian tube cancer, Fetus in fetu, Fibroma, Fibrosarcoma, Follicularlymphoma, Follicular thyroid cancer, Gallbladder Cancer, Gallbladdercancer, Ganglioglioma, Ganglioneuroma, Gastric Cancer, Gastric lymphoma,Gastrointestinal cancer, Gastrointestinal Carcinoid Tumor,Gastrointestinal Stromal Tumor, Gastrointestinal stromal tumor, Germcell tumor, Germinoma, Gestational choriocarcinoma, GestationalTrophoblastic Tumor, Giant cell tumor of bone, Glioblastoma multiforme,Glioma, Gliomatosis cerebri, Glomus tumor, Glucagonoma, Gonadoblastoma,Granulosa cell tumor, Hairy Cell Leukemia, Hairy cell leukemia, Head andNeck Cancer, Head and neck cancer, Heart cancer, Hemangioblastoma,Hemangiopericytoma, Hemangiosarcoma, Hematological malignancy,Hepatocellular carcinoma, Hepatosplenic T-cell lymphoma, Hereditarybreast-ovarian cancer syndrome, Hodgkin Lymphoma, Hodgkin's lymphoma,Hypopharyngeal Cancer, Hypothalamic Glioma, Inflammatory breast cancer,Intraocular Melanoma, Islet cell carcinoma, Islet Cell Tumor, Juvenilemyelomonocytic leukemia, Kaposi Sarcoma, Kaposi's sarcoma, KidneyCancer, Klatskin tumor, Krukenberg tumor, Laryngeal Cancer, Laryngealcancer, Lentigo maligna melanoma, Leukemia, Leukemia, Lip and OralCavity Cancer, Liposarcoma, Lung cancer, Luteoma, Lymphangioma,Lymphangiosarcoma, Lymphoepithelioma, Lymphoid leukemia, Lymphoma,Macroglobulinemia, Malignant Fibrous Histiocytoma, Malignant fibroushistiocytoma, Malignant Fibrous Histiocytoma of Bone, Malignant Glioma,Malignant Mesothelioma, Malignant peripheral nerve sheath tumor,Malignant rhabdoid tumor, Malignant triton tumor, MALT lymphoma, Mantlecell lymphoma, Mast cell leukemia, Mediastinal germ cell tumor,Mediastinal tumor, Medullary thyroid cancer, Medulloblastoma,Medulloblastoma, Medulloepithelioma, Melanoma, Melanoma, Meningioma,Merkel Cell Carcinoma, Mesothelioma, Mesothelioma, Metastatic SquamousNeck Cancer with Occult Primary, Metastatic urothelial carcinoma, MixedMullerian tumor, Monocytic leukemia, Mouth Cancer, Mucinous tumor,Multiple Endocrine Neoplasia Syndrome, Multiple Myeloma, Multiplemyeloma, Mycosis Fungoides, Mycosis fungoides, Myelodysplastic Disease,Myelodysplastic Syndromes, Myeloid leukemia, Myeloid sarcoma,Myeloproliferative Disease, Myxoma, Nasal Cavity Cancer, NasopharyngealCancer, Nasopharyngeal carcinoma, Neoplasm, Neurinoma, Neuroblastoma,Neuroblastoma, Neurofibroma, Neuroma, Nodular melanoma, Non-HodgkinLymphoma, Non-Hodgkin lymphoma, Nonmelanoma Skin Cancer, Non-Small CellLung Cancer, Ocular oncology, Oligoastrocytoma, Oligodendroglioma,Oncocytoma, Optic nerve sheath meningioma, Oral Cancer, Oral cancer,Oropharyngeal Cancer, Osteosarcoma, Osteosarcoma, Ovarian Cancer,Ovarian cancer, Ovarian Epithelial Cancer, Ovarian Germ Cell Tumor,Ovarian Low Malignant Potential Tumor, Paget's disease of the breast,Pancoast tumor, Pancreatic Cancer, Pancreatic cancer, Papillary thyroidcancer, Papillomatosis, Paraganglioma, Paranasal Sinus Cancer,Parathyroid Cancer, Penile Cancer, Perivascular epithelioid cell tumor,Pharyngeal Cancer, Pheochromocytoma, Pineal Parenchymal Tumor ofIntermediate Differentiation, Pineoblastoma, Pituicytoma, Pituitaryadenoma, Pituitary tumor, Plasma Cell Neoplasm, Pleuropulmonaryblastoma, Polyembryoma, Precursor T-lymphoblastic lymphoma, Primarycentral nervous system lymphoma, Primary effusion lymphoma, PrimaryHepatocellular Cancer, Primary Liver Cancer, Primary peritoneal cancer,Primitive neuroectodermal tumor, Prostate cancer, Pseudomyxomaperitonei, Rectal Cancer, Renal cell carcinoma, Respiratory TractCarcinoma Involving the NUT Gene on Chromosome 15, Retinoblastoma,Rhabdomyoma, Rhabdomyosarcoma, Richter's transformation, Sacrococcygealteratoma, Salivary Gland Cancer, Sarcoma, Schwannomatosis, Sebaceousgland carcinoma, Secondary neoplasm, Seminoma, Serous tumor,Sertoli-Leydig cell tumor, Sex cord-stromal tumor, Sezary Syndrome,Signet ring cell carcinoma, Skin Cancer, Small blue round cell tumor,Small cell carcinoma, Small Cell Lung Cancer, Small cell lymphoma, Smallintestine cancer, Soft tissue sarcoma, Somatostatinoma, Soot wart,Spinal Cord Tumor, Spinal tumor, Splenic marginal zone lymphoma,Squamous cell carcinoma, Stomach cancer, Superficial spreading melanoma,Supratentorial Primitive Neuroectodermal Tumor, Surfaceepithelial-stromal tumor, Synovial sarcoma, T-cell acute lymphoblasticleukemia, T-cell large granular lymphocyte leukemia, T-cell leukemia,T-cell lymphoma, T-cell prolymphocytic leukemia, Teratoma, Terminallymphatic cancer, Testicular cancer, Thecoma, Throat Cancer, ThymicCarcinoma, Thymoma, Thyroid cancer, Transitional Cell Cancer of RenalPelvis and Ureter, Transitional cell carcinoma, Urachal cancer, Urethralcancer, Urogenital neoplasm, Uterine sarcoma, Uveal melanoma, VaginalCancer, Verner Morrison syndrome, Verrucous carcinoma, Visual PathwayGlioma, Vulvar Cancer, Waldenstrom's macroglobulinemia, Warthin's tumor,Wilms' tumor, and combinations thereof.

Where cancer cells are to be analyzed, primer sequences useful in any ofthe various operations for attaching barcode sequences and/oramplification reactions may comprise gene specific sequences whichtarget genes or regions of genes associated with or suspected of beingassociated with cancer. For example, this can include genes or regionsof genes where the presence of mutations (e.g., insertions, deletions,polymorphisms, copy number variations, and gene fusions) associated witha cancerous condition are suspected to be present in a cell population.

As with cancer cell analysis, the analysis and diagnosis of fetal healthor abnormality through the analysis of fetal cells is a difficult taskusing conventional techniques. In particular, in the absence ofrelatively invasive procedures, such as amniocentesis obtaining fetalcell samples can employ harvesting those cells from the maternalcirculation. As will be appreciated, such circulating fetal cells makeup an extremely small fraction of the overall cellular population ofthat circulation. As a result complex analyses are performed in order tocharacterize what of the obtained data is likely derived from fetalcells as opposed to maternal cells. By employing the single cellcharacterization methods and systems described herein, however, one canattribute genetic make up to individual cells, and categorize thosecells as maternal or fetal based upon their respective genetic make-up.Further, the genetic sequence of fetal cells may be used to identify anyof a number of genetic disorders, including, e.g., aneuploidy such asDown syndrome, Edwards syndrome, and Patau syndrome.

Also of interest are immune cells. Methods and compositions disclosedherein can be utilized for sequence analysis of the immune repertoire.Analysis of sequence information underlying the immune repertoire canprovide a significant improvement in understanding the status andfunction of the immune system.

Non-limiting examples of immune cells which can be analyzed utilizingthe methods described herein include B cells, T cells (e.g., cytotoxic Tcells, natural killer T cells, regulatory T cells, and T helper cells),natural killer cells, cytokine induced killer (CIK) cells; myeloidcells, such as granulocytes (basophil granulocytes, eosinophilgranulocytes, neutrophil granulocytes/hypersegmented neutrophils),monocytes/macrophages, mast cell, thrombocytes/megakaryocytes, anddendritic cells. In some embodiments, individual T cells are analyzedusing the methods disclosed herein. In some embodiments, individual Bcells are analyzed using the methods disclosed herein.

Immune cells express various adaptive immunological receptors relatingto immune function, such as T cell receptors and B cell receptors. Tcell receptors and B cells receptors play a part in the immune responseby specifically recognizing and binding to antigens and aiding in theirdestruction.

The T cell receptor, or TCR, is a molecule found on the surface of Tcells that is generally responsible for recognizing fragments of antigenas peptides bound to major histocompatibility complex (MHC) molecules.The TCR is generally a heterodimer of two chains, each of which is amember of the immunoglobulin superfamily, possessing an N-terminalvariable (V) domain, and a C terminal constant domain. In humans, in 95%of T cells the TCR consists of an alpha (α) and beta (β) chain, whereasin 5% of T cells the TCR consists of gamma and delta (γ/δ) chains. Thisratio can change during ontogeny and in diseased states as well as indifferent species. When the TCR engages with antigenic peptide and MHC(peptide/MHC), the T lymphocyte is activated through signaltransduction.

Each of the two chains of a TCR contains multiple copies of genesegments—a variable ‘V’ gene segment, a diversity ‘D’ gene segment, anda joining T gene segment. The TCR alpha chain is generated byrecombination of V and J segments, while the beta chain is generated byrecombination of V, D, and J segments. Similarly, generation of the TCRgamma chain involves recombination of V and J gene segments, whilegeneration of the TCR delta chain occurs by recombination of V, D, and Jgene segments. The intersection of these specific regions (V and J forthe alpha or gamma chain, or V, D and J for the beta or delta chain)corresponds to the CDR3 region that is important for antigen-MHCrecognition. Complementarity determining regions (e.g., CDR1, CDR2, andCDR3), or hypervariable regions, are sequences in the variable domainsof antigen receptors (e.g., T cell receptor and immunoglobulin) that cancomplement an antigen. Most of the diversity of CDRs is found in CDR3,with the diversity being generated by somatic recombination eventsduring the development of T lymphocytes. A unique nucleotide sequencethat arises during the gene arrangement process can be referred to as aclonotype.

The B cell receptor, or BCR, is a molecule found on the surface of Bcells. The antigen binding portion of a BCR is composed of amembrane-bound antibody that, like most antibodies (e.g.,immunoglobulins), has a unique and randomly determined antigen-bindingsite. The antigen binding portion of a BCR includes membrane-boundimmunoglobulin molecule of one isotype (e.g., IgD, IgM, IgA, IgG, orIgE). When a B cell is activated by its first encounter with a cognateantigen, the cell proliferates and differentiates to generate apopulation of antibody-secreting plasma B cells and memory B cells. Thevarious immunoglobulin isotypes differ in their biological features,structure, target specificity and distribution. A variety of molecularmechanisms exist to generate initial diversity, including geneticrecombination at multiple sites.

The BCR is composed of two genes IgH and IgK (or IgL) coding forantibody heavy and light chains. Immunoglobulins are formed byrecombination among gene segments, sequence diversification at thejunctions of these segments, and point mutations throughout the gene.Each heavy chain gene contains multiple copies of three different genesegments—a variable ‘V’ gene segment, a diversity ‘D’ gene segment, anda joining T gene segment. Each light chain gene contains multiple copiesof two different gene segments for the variable region of the protein—avariable ‘V’ gene segment and a joining T gene segment. Therecombination can generate a molecule with one of each of the V, D, andJ segments. Furthermore, several bases may be deleted and others added(called N and P nucleotides) at each of the two junctions, therebygenerating further diversity. After B cell activation, a process ofaffinity maturation through somatic hypermutation occurs. In thisprocess progeny cells of the activated B cells accumulate distinctsomatic mutations throughout the gene with higher mutation concentrationin the CDR regions leading to the generation of antibodies with higheraffinity to the antigens. In addition to somatic hypermutation activatedB cells undergo the process of isotype switching. Antibodies with thesame variable segments can have different forms (isotypes) depending onthe constant segment. Whereas all naïve B cells express IgM (or IgD),activated B cells mostly express IgG but also IgM, IgA and IgE. Thisexpression switching from IgM (and/or IgD) to IgG, IgA, or IgE occursthrough a recombination event causing one cell to specialize inproducing a specific isotype. A unique nucleotide sequence that arisesduring the gene arrangement process can similarly be referred to as aclonotype.

In some embodiments, the methods, compositions and systems disclosedherein are utilized to analyze the various sequences of TCRs and BCRsfrom immune cells, for example various clonotypes. In some embodiments,methods, compositions and systems disclosed herein are used to analyzethe sequence of a TCR alpha chain, a TCR beta chain, a TCR delta chain,a TCR gamma chain, or any fragment thereof (e.g., variable regionsincluding VDJ or VJ regions, constant regions, transmembrane regions,fragments thereof, combinations thereof, and combinations of fragmentsthereof). In some embodiments, methods, compositions and systemsdisclosed herein are used to analyze the sequence of a B cell receptorheavy chain, B cell receptor light chain, or any fragment thereof (e.g.,variable regions including VDJ or VJ regions, constant regions,transmembrane regions, fragments thereof, combinations thereof, andcombinations of fragments thereof).

Where immune cells are to be analyzed, primer sequences useful in any ofthe various operations for attaching barcode sequences and/oramplification reactions may comprise gene specific sequences whichtarget genes or regions of genes of immune cell proteins, for exampleimmune receptors. Such gene sequences include, but are not limited to,sequences of various T cell receptor alpha variable genes (TRAV genes),T cell receptor alpha joining genes (TRAJ genes), T cell receptor alphaconstant genes (TRAC genes), T cell receptor beta variable genes (TRBVgenes), T cell receptor beta diversity genes (TRBD genes), T cellreceptor beta joining genes (TRBJ genes), T cell receptor beta constantgenes (TRBC genes), T cell receptor gamma variable genes (TRGV genes), Tcell receptor gamma joining genes (TRGJ genes), T cell receptor gammaconstant genes (TRGC genes), T cell receptor delta variable genes (TRDVgenes), T cell receptor delta diversity genes (TRDD genes), T cellreceptor delta joining genes (TRDJ genes), and T cell receptor deltaconstant genes (TRDC genes).

The ability to characterize individual cells from larger diversepopulations of cells is also of significant value in both environmentaltesting as well as in forensic analysis, where samples may, by theirnature, be made up of diverse populations of cells and other materialthat “contaminate” the sample, relative to the cells for which thesample is being tested, e.g., environmental indicator organisms, toxicorganisms, and the like for, e.g., environmental and food safetytesting, victim and/or perpetrator cells in forensic analysis for sexualassault, and other violent crimes, and the like.

Additionally the methods and compositions disclosed herein, allow thedetermination of not only the immune repertoire and differentclonotypes, but the functional characteristics (e.g., the transcriptome)of the cells associated with a clonotype or plurality of clonotypes thatbind to the same or similar antigen. These functional characteristicscan comprise transcription of cytokine, chemokine, or cell-surfaceassociated molecules, such as, costimulatory molecules, checkpointinhibitors, cell surface maturation markers, or cell-adhesion molecules.Such analysis allows a cell or cell population expressing a particular Tcell receptor, B cell receptor, or immunoglobulin to be associated withcertain functional characteristics. For example, for any given antigenthere will be multiple clonotypes of T cell receptor, B cell receptor,or immunoglobulin that specifically bind to that antigen. Multipleclonotypes that bind to the same antigen are known as the idiotype.

Additional useful applications of the above described single cellsequencing and characterization processes are in the field ofneuroscience research and diagnosis. In particular, neural cells caninclude long interspersed nuclear elements (LINEs), or ‘jumping’ genesthat can move around the genome, which cause each neuron to differ fromits neighbor cells. Research has shown that the number of LINES in humanbrain exceeds that of other tissues, e.g., heart and liver tissue, withbetween 80 and 300 unique insertions (See, e.g., Coufal, N. G. et al.Nature 460, 1127-1131 (2009)). These differences have been postulated asbeing related to a person's susceptibility to neuro-logical disorders(see, e.g., Muotri, A. R. et al. Nature 468, 443-446 (2010)), or providethe brain with a diversity with which to respond to challenges. As such,the methods described herein may be used in the sequencing andcharacterization of individual neural cells.

The single cell analysis methods described herein are also useful in theanalysis of gene expression, as noted above, both in terms ofidentification of RNA transcripts and their quantitation. In particular,using the single cell level analysis methods described herein, one canisolate and analyze the RNA transcripts present in individual cells,populations of cells, or subsets of populations of cells. In particular,in some cases, the barcode oligonucleotides may be configured to prime,replicate and consequently yield barcoded fragments of RNA fromindividual cells. For example, in some cases, the barcodeoligonucleotides may include mRNA specific priming sequences, e.g.,poly-T primer segments that allow priming and replication of mRNA in areverse transcription reaction or other targeted priming sequences.Alternatively or additionally, random RNA priming may be carried outusing random N-mer primer segments of the barcode oligonucleotides.

FIG. 6 provides a schematic of one example method for RNA expressionanalysis in individual cells using the methods described herein. Asshown, at operation 602 a cell containing sample is sorted for viablecells, which are quantified and diluted for subsequent partitioning. Atoperation 604, the individual cells separately co-partitioned with gelbeads bearing the barcoding oligonucleotides as described herein. Thecells are lysed and the barcoded oligonucleotides released into thepartitions at operation 606, where they interact with and hybridize tothe mRNA at operation 608, e.g., by virtue of a poly-T primer sequence,which is complementary to the poly-A tail of the mRNA. Using the poly-Tbarcode oligonucleotide as a priming sequence, a reverse transcriptionreaction is carried out at operation 610 to synthesize a cDNA transcriptof the mRNA that includes the barcode sequence. The barcoded cDNAtranscripts are then subjected to additional amplification at operation612, e.g., using a polymerase chain reaction (PCR) process, purificationat operation 614, before they are placed on a nucleic acid sequencingsystem for determination of the cDNA sequence and its associated barcodesequence(s). In some cases, as shown, operations 602 through 608 canoccur while the reagents remain in their original droplet or partition,while operations 612 through 616 can occur in bulk (e.g., outside of thepartition). In the case where a partition is a droplet in an emulsion,the emulsion can be broken and the contents of the droplet pooled inorder to complete operations 612 through 616. In some cases, barcodeoligonucleotides may be digested with exonucleases after the emulsion isbroken. Exonuclease activity can be inhibited byethylenediaminetetraacetic acid (EDTA) following primer digestion. Insome cases, operation 610 may be performed either within the partitionsbased upon co-partitioning of the reverse transcription mixture, e.g.,reverse transcriptase and associated reagents, or it may be performed inbulk.

As noted elsewhere herein, the structure of the barcode oligonucleotidesmay include a number of sequence elements in addition to theoligonucleotide barcode sequence. One example of a barcodeoligonucleotide for use in RNA analysis as described above is shown inFIG. 7. As shown, the overall oligonucleotide 702 is coupled to a bead704 by a releasable linkage 706, such as a disulfide linker. Theoligonucleotide may include functional sequences that are used insubsequent processing, such as functional sequence 708, which mayinclude one or more of a sequencer specific flow cell attachmentsequence, e.g., a P5 sequence for Illumina sequencing systems, as wellas sequencing primer sequences, e.g., a R1 primer for Illuminasequencing systems. A barcode sequence 710 is included within thestructure for use in barcoding the sample RNA. An mRNA specific primingsequence, such as poly-T sequence 712 is also included in theoligonucleotide structure. An anchoring sequence segment 714 may beincluded to ensure that the poly-T sequence hybridizes at the sequenceend of the mRNA. This anchoring sequence can include a random shortsequence of nucleotides, e.g., 1-mer, 2-mer, 3-mer or longer sequence,which will ensure that the poly-T segment is more likely to hybridize atthe sequence end of the poly-A tail of the mRNA. An additional sequencesegment 716 may be provided within the oligonucleotide sequence. In somecases, this additional sequence provides a unique molecular identifier(UMI) sequence segment, e.g., as a random sequence (e.g., such as arandom N-mer sequence) that varies across individual oligonucleotidescoupled to a single bead, whereas barcode sequence 710 can be constantamong oligonucleotides tethered to an individual bead. This uniquesequence serves to provide a unique identifier of the starting mRNAmolecule that was captured, in order to allow quantitation of the numberof original expressed RNA. As will be appreciated, although shown as asingle oligonucleotide tethered to the surface of a bead, individualbead can include tens to hundreds of thousands or even millions ofindividual oligonucleotide molecules, where, as noted, the barcodesegment can be constant or relatively constant for a given bead, butwhere the variable or unique sequence segment will vary across anindividual bead. This unique molecular identifier (UMI) sequence segmentmay include from 5 to about 8 or more nucleotides within the sequence ofthe oligonucleotides. In some cases, the unique molecular identifier(UMI) sequence segment can be 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,14, 15, 16, 17, 18, 19 or 20 nucleotides in length or longer. In somecases, the unique molecular identifier (UMI) sequence segment can be atleast 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or20 nucleotides in length or longer. In some cases, the unique molecularidentifier (UMI) sequence segment can be at most 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 nucleotides in length orshorter.

In operation, and with reference to FIGS. 6 and 7, a cell isco-partitioned along with a barcode bearing bead and lysed while thebarcoded oligonucleotides are released from the bead. The poly-T portionof the released barcode oligonucleotide then hybridizes to the poly-Atail of the mRNA. The poly-T segment then primes the reversetranscription of the mRNA to produce a cDNA transcript of the mRNA, butwhich includes each of the sequence segments 708-716 of the barcodeoligonucleotide. Again, because the oligonucleotide 702 includes ananchoring sequence 714, it will more likely hybridize to and primereverse transcription at the sequence end of the poly-A tail of themRNA. Within any given partition, all of the cDNA transcripts of theindividual mRNA molecules will include a common barcode sequence segment710. However, by including the unique random N-mer sequence, thetranscripts made from different mRNA molecules within a given partitionwill vary at this unique sequence. This provides a quantitation featurethat can be identifiable even following any subsequent amplification ofthe contents of a given partition, e.g., the number of unique segmentsassociated with a common barcode can be indicative of the quantity ofmRNA originating from a single partition, and thus, a single cell. Asnoted above, the transcripts are then amplified, cleaned up andsequenced to identify the sequence of the cDNA transcript of the mRNA,as well as to sequence the barcode segment and the unique sequencesegment.

As noted elsewhere herein, while a poly-T primer sequence is described,other targeted or random priming sequences may also be used in primingthe reverse transcription reaction. In some cases, the primer sequencecan be a gene specific primer sequence which targets specific genes forreverse transcription. In some examples, such target genes comprise Tcell receptor genes, B cell receptor genes or immunoglobulin receptorgenes. Likewise, although described as releasing the barcodedoligonucleotides into the partition along with the contents of the lysedcells, it will be appreciated that in some cases, the gel bead boundoligonucleotides may be used to hybridize and capture the mRNA on thesolid phase of the gel beads, in order to facilitate the separation ofthe RNA from other cell contents.

An additional example of a barcode oligonucleotide for use in RNAanalysis, including messenger RNA (mRNA, including mRNA obtained from acell) analysis, is shown in FIG. 9A. As shown, the overalloligonucleotide 902 can be coupled to a bead 904 by a releasable linkage906, such as a disulfide linker. The oligonucleotide may includefunctional sequences that are used in subsequent processing, such asfunctional sequence 908, which may include a sequencer specific flowcell attachment sequence, e.g., a P5 sequence for Illumina sequencingsystems, as well as functional sequence 910, which may includesequencing primer sequences, e.g., a R1 primer binding site for Illuminasequencing systems. A barcode sequence 912 is included within thestructure for use in barcoding the sample RNA. An RNA specific (e.g.,mRNA specific) priming sequence, such as poly-T sequence 914 is alsoincluded in the oligonucleotide structure. An anchoring sequence segment(not shown) may be included to ensure that the poly-T sequencehybridizes at the sequence end of the mRNA. An additional sequencesegment 916 may be provided within the oligonucleotide sequence. Thisadditional sequence can provide a unique molecular identifier (UMI)sequence segment, e.g., as a random N-mer sequence that varies acrossindividual oligonucleotides coupled to a single bead, whereas barcodesequence 912 can be constant among oligonucleotides tethered to anindividual bead. As described elsewhere herein, this unique sequence canserve to provide a unique identifier of the starting mRNA molecule thatwas captured, in order to allow quantitation of the number of originalexpressed RNA, e.g., mRNA counting. As will be appreciated, althoughshown as a single oligonucleotide tethered to the surface of a bead,individual beads can include tens to hundreds of thousands or evenmillions of individual oligonucleotide molecules, where, as noted, thebarcode segment can be constant or relatively constant for a given bead,but where the variable or unique sequence segment will vary across anindividual bead.

In an example method of cellular RNA (e.g., mRNA) analysis and inreference to FIG. 9A, a cell is co-partitioned along with a barcodebearing bead, switch oligo 924, and other reagents such as reversetranscriptase, a reducing agent and dNTPs into a partition (e.g., adroplet in an emulsion). In operation 950, the cell is lysed while thebarcoded oligonucleotides 902 are released from the bead (e.g., via theaction of the reducing agent) and the poly-T segment 914 of the releasedbarcode oligonucleotide then hybridizes to the poly-A tail of mRNA 920that is released from the cell. Next, in operation 952 the poly-Tsegment 914 is extended in a reverse transcription reaction using themRNA as a template to produce a cDNA transcript 922 complementary to themRNA and also includes each of the sequence segments 908, 912, 910, 916and 914 of the barcode oligonucleotide. Terminal transferase activity ofthe reverse transcriptase can add additional bases to the cDNAtranscript (e.g., polyC). The switch oligo 924 may then hybridize withthe additional bases added to the cDNA transcript and facilitatetemplate switching. A sequence complementary to the switch oligosequence can then be incorporated into the cDNA transcript 922 viaextension of the cDNA transcript 922 using the switch oligo 924 as atemplate. Within any given partition, all of the cDNA transcripts of theindividual mRNA molecules will include a common barcode sequence segment912. However, by including the unique random N-mer sequence 916, thetranscripts made from different mRNA molecules within a given partitionwill vary at this unique sequence. As described elsewhere herein, thisprovides a quantitation feature that can be identifiable even followingany subsequent amplification of the contents of a given partition, e.g.,the number of unique segments associated with a common barcode can beindicative of the quantity of mRNA originating from a single partition,and thus, a single cell. Following operation 952, the cDNA transcript922 is then amplified with primers 926 (e.g., PCR primers) in operation954. Next, the amplified product is then purified (e.g., via solid phasereversible immobilization (SPRI)) in operation 956. At operation 958,the amplified product is then sheared, ligated to additional functionalsequences, and further amplified (e.g., via PCR). The functionalsequences may include a sequencer specific flow cell attachment sequence930, e.g., a P7 sequence for Illumina sequencing systems, as well asfunctional sequence 928, which may include a sequencing primer bindingsite, e.g., for a R2 primer for Illumina sequencing systems, as well asfunctional sequence 932, which may include a sample index, e.g., an i7sample index sequence for Illumina sequencing systems. In some cases,operations 950 and 952 can occur in the partition, while operations 954,956 and 958 can occur in bulk solution (e.g., in a pooled mixtureoutside of the partition). In the case where a partition is a droplet inan emulsion, the emulsion can be broken and the contents of the dropletpooled in order to complete operations 954, 956 and 958. In some cases,operation 954 may be completed in the partition. In some cases, barcodeoligonucleotides may be digested with exonucleases after the emulsion isbroken. Exonuclease activity can be inhibited byethylenediaminetetraacetic acid (EDTA) following primer digestion.Although described in terms of specific sequence references used forcertain sequencing systems, e.g., Illumina systems, it will beunderstood that the reference to these sequences is for illustrationpurposes only, and the methods described herein may be configured foruse with other sequencing systems incorporating specific priming,attachment, index, and other operational sequences used in thosesystems, e.g., systems available from Ion Torrent, Oxford Nanopore,Genia, Pacific Biosciences, Complete Genomics, and the like.

In an alternative example of a barcode oligonucleotide for use in RNA(e.g., cellular RNA) analysis as shown in FIG. 9A, functional sequence908 may be a P7 sequence and functional sequence 910 may be a R2 primerbinding site. Moreover, the functional sequence 930 may be a P5sequence, functional sequence 928 may be a R1 primer binding site, andfunctional sequence 932 may be an i5 sample index sequence for Illuminasequencing systems. The configuration of the constructs generated bysuch a barcode oligonucleotide can help minimize (or avoid) sequencingof the poly-T sequence during sequencing.

Shown in FIG. 9B is another example method for RNA analysis, includingcellular mRNA analysis. In this method, the switch oligo 924 isco-partitioned with the individual cell and barcoded bead along withreagents such as reverse transcriptase, a reducing agent and dNTPs intoa partition (e.g., a droplet in an emulsion). The switch oligo 924 maybe labeled with an additional tag 934, e.g., biotin. In operation 951,the cell is lysed while the barcoded oligonucleotides 902 (e.g., asshown in FIG. 9A) are released from the bead (e.g., via the action ofthe reducing agent). In some cases, sequence 908 is a P7 sequence andsequence 910 is a R2 primer binding site. In other cases, sequence 908is a P5 sequence and sequence 910 is a R1 primer binding site. Next, thepoly-T segment 914 of the released barcode oligonucleotide hybridizes tothe poly-A tail of mRNA 920 that is released from the cell. In operation953, the poly-T segment 914 is then extended in a reverse transcriptionreaction using the mRNA as a template to produce a cDNA transcript 922complementary to the mRNA and also includes each of the sequencesegments 908, 912, 910, 916 and 914 of the barcode oligonucleotide.Terminal transferase activity of the reverse transcriptase can addadditional bases to the cDNA transcript (e.g., polyC). The switch oligo924 may then hybridize with the cDNA transcript and facilitate templateswitching. A sequence complementary to the switch oligo sequence canthen be incorporated into the cDNA transcript 922 via extension of thecDNA transcript 922 using the switch oligo 924 as a template. Next, anisolation operation 960 can be used to isolate the cDNA transcript 922from the reagents and oligonucleotides in the partition. The additionaltag 934, e.g., biotin, can be contacted with an interacting tag 936,e.g., streptavidin, which may be attached to a magnetic bead 938. Atoperation 960 the cDNA can be isolated with a pull-down operation (e.g.,via magnetic separation, centrifugation) before amplification (e.g., viaPCR) in operation 955, followed by purification (e.g., via solid phasereversible immobilization (SPRI)) in operation 957 and furtherprocessing (shearing, ligation of sequences 928, 932 and 930 andsubsequent amplification (e.g., via PCR)) in operation 959. In somecases where sequence 908 is a P7 sequence and sequence 910 is a R2primer binding site, sequence 930 is a P5 sequence and sequence 928 is aR1 primer binding site and sequence 932 is an i5 sample index sequence.In some cases where sequence 908 is a P5 sequence and sequence 910 is aR1 primer binding site, sequence 930 is a P7 sequence and sequence 928is a R2 primer binding site and sequence 932 is an i7 sample indexsequence. In some cases, as shown, operations 951 and 953 can occur inthe partition, while operations 960, 955, 957 and 959 can occur in bulksolution (e.g., in a pooled mixture outside of the partition). In thecase where a partition is a droplet in an emulsion, the emulsion can bebroken and the contents of the droplet pooled in order to completeoperation 960. The operations 955, 957, and 959 can then be carried outfollowing operation 960 after the transcripts are pooled for processing.

Shown in FIG. 9C is another example method for RNA analysis, includingcellular mRNA analysis. In this method, the switch oligo 924 isco-partitioned with the individual cell and barcoded bead along withreagents such as reverse transcriptase, a reducing agent and dNTPs in apartition (e.g., a droplet in an emulsion). In operation 961, the cellis lysed while the barcoded oligonucleotides 902 (e.g., as shown in FIG.9A) are released from the bead (e.g., via the action of the reducingagent). In some cases, sequence 908 is a P7 sequence and sequence 910 isa R2 primer binding site. In other cases, sequence 908 is a P5 sequenceand sequence 910 is a R1 primer binding site. Next, the poly-T segment914 of the released barcode oligonucleotide then hybridizes to thepoly-A tail of mRNA 920 that is released from the cell. Next, inoperation 963 the poly-T segment 914 is then extended in a reversetranscription reaction using the mRNA as a template to produce a cDNAtranscript 922 complementary to the mRNA and also includes each of thesequence segments 908, 912, 910, 916 and 914 of the barcodeoligonucleotide. Terminal transferase activity of the reversetranscriptase can add additional bases to the cDNA transcript (e.g.,polyC). The switch oligo 924 may then hybridize with the cDNA transcriptand facilitate template switching. A sequence complementary to theswitch oligo sequence can then be incorporated into the cDNA transcript922 via extension of the cDNA transcript 922 using the switch oligo 924as a template. Following operation 961 and operation 963, mRNA 920 andcDNA transcript 922 are denatured in operation 962. At operation 964, asecond strand is extended from a primer 940 having an additional tag942, e.g., biotin, and hybridized to the cDNA transcript 922. Also inoperation 964, the biotin labeled second strand can be contacted with aninteracting tag 936, e.g., streptavidin, which may be attached to amagnetic bead 938. The cDNA can be isolated with a pull-down operation(e.g., via magnetic separation, centrifugation) before amplification(e.g., via polymerase chain reaction (PCR)) in operation 965, followedby purification (e.g., via solid phase reversible immobilization (SPRI))in operation 967 and further processing (shearing, ligation of sequences928, 932 and 930 and subsequent amplification (e.g., via PCR)) inoperation 969. In some cases where sequence 908 is a P7 sequence andsequence 910 is a R2 primer binding site, sequence 930 is a P5 sequenceand sequence 928 is a R1 primer binding site and sequence 932 is an i5sample index sequence. In some cases where sequence 908 is a P5 sequenceand sequence 910 is a R1 primer binding site, sequence 930 is a P7sequence and sequence 928 is a R2 primer binding site and sequence 932is an i7 sample index sequence. In some cases, operations 961 and 963can occur in the partition, while operations 962, 964, 965, 967, and 969can occur in bulk (e.g., outside the partition). In the case where apartition is a droplet in an emulsion, the emulsion can be broken andthe contents of the droplet pooled in order to complete operations 962,964, 965, 967 and 969.

Shown in FIG. 9D is another example method for RNA analysis, includingcellular mRNA analysis. In this method, the switch oligo 924 isco-partitioned with the individual cell and barcoded bead along withreagents such as reverse transcriptase, a reducing agent and dNTPs. Inoperation 971, the cell is lysed while the barcoded oligonucleotides 902(e.g., as shown in FIG. 9A) are released from the bead (e.g., via theaction of the reducing agent). In some cases, sequence 908 is a P7sequence and sequence 910 is a R2 primer binding site. In other cases,sequence 908 is a P5 sequence and sequence 910 is a R1 primer bindingsite. Next the poly-T segment 914 of the released barcodeoligonucleotide then hybridizes to the poly-A tail of mRNA 920 that isreleased from the cell. Next in operation 973, the poly-T segment 914 isthen extended in a reverse transcription reaction using the mRNA as atemplate to produce a cDNA transcript 922 complementary to the mRNA andalso includes each of the sequence segments 908, 912, 910, 916 and 914of the barcode oligonucleotide. Terminal transferase activity of thereverse transcriptase can add additional bases to the cDNA transcript(e.g., polyC). The switch oligo 924 may then hybridize with the cDNAtranscript and facilitate template switching. A sequence complementaryto the switch oligo sequence can then be incorporated into the cDNAtranscript 922 via extension of the cDNA transcript 922 using the switcholigo 924 as a template. In operation 966, the mRNA 920, cDNA transcript922 and switch oligo 924 can be denatured, and the cDNA transcript 922can be hybridized with a capture oligonucleotide 944 labeled with anadditional tag 946, e.g., biotin. In this operation, the biotin-labeledcapture oligonucleotide 944, which is hybridized to the cDNA transcript,can be contacted with an interacting tag 936, e.g., streptavidin, whichmay be attached to a magnetic bead 938. Following separation from otherspecies (e.g., excess barcoded oligonucleotides) using a pull-downoperation (e.g., via magnetic separation, centrifugation), the cDNAtranscript can be amplified (e.g., via PCR) with primers 926 atoperation 975, followed by purification (e.g., via solid phasereversible immobilization (SPRI)) in operation 977 and furtherprocessing (shearing, ligation of sequences 928, 932 and 930 andsubsequent amplification (e.g., via PCR)) in operation 979. In somecases where sequence 908 is a P7 sequence and sequence 910 is a R2primer binding site, sequence 930 is a P5 sequence and sequence 928 is aR1 primer binding site and sequence 932 is an i5 sample index sequence.In other cases where sequence 908 is a P5 sequence and sequence 910 is aR1 primer binding site, sequence 930 is a P7 sequence and sequence 928is a R2 primer binding site and sequence 932 is an i7 sample indexsequence. In some cases, operations 971 and 973 can occur in thepartition, while operations 966, 975, 977 (purification), and 979 canoccur in bulk (e.g., outside the partition). In the case where apartition is a droplet in an emulsion, the emulsion can be broken andthe contents of the droplet pooled in order to complete operations 966,975, 977 and 979.

Shown in FIG. 9E is another example method for RNA analysis, includingcellular RNA analysis. In this method, an individual cell isco-partitioned along with a barcode bearing bead, a switch oligo 990,and other reagents such as reverse transcriptase, a reducing agent anddNTPs into a partition (e.g., a droplet in an emulsion). In operation981, the cell is lysed while the barcoded oligonucleotides (e.g., 902 asshown in FIG. 9A) are released from the bead (e.g., via the action ofthe reducing agent). In some cases, sequence 908 is a P7 sequence andsequence 910 is a R2 primer binding site. In other cases, sequence 908is a P5 sequence and sequence 910 is a R1 primer binding site. Next, thepoly-T segment of the released barcode oligonucleotide then hybridizesto the poly-A tail of mRNA 920 released from the cell. Next at operation983, the poly-T segment is then extended in a reverse transcriptionreaction to produce a cDNA transcript 922 complementary to the mRNA andalso includes each of the sequence segments 908, 912, 910, 916 and 914of the barcode oligonucleotide. Terminal transferase activity of thereverse transcriptase can add additional bases to the cDNA transcript(e.g., polyC). The switch oligo 990 may then hybridize with the cDNAtranscript and facilitate template switching. A sequence complementaryto the switch oligo sequence and including a T7 promoter sequence, canbe incorporated into the cDNA transcript 922. At operation 968, a secondstrand is synthesized and at operation 970 the T7 promoter sequence canbe used by T7 polymerase to produce RNA transcripts in in vitrotranscription. At operation 985 the RNA transcripts can be purified(e.g., via solid phase reversible immobilization (SPRI)), reversetranscribed to form DNA transcripts, and a second strand can besynthesized for each of the DNA transcripts. In some cases, prior topurification, the RNA transcripts can be contacted with a DNase (e.g.,DNAase I) to break down residual DNA. At operation 987 the DNAtranscripts are then fragmented and ligated to additional functionalsequences, such as sequences 928, 932 and 930 and, in some cases,further amplified (e.g., via PCR). In some cases where sequence 908 is aP7 sequence and sequence 910 is a R2 primer binding site, sequence 930is a P5 sequence and sequence 928 is a R1 primer binding site andsequence 932 is an i5 sample index sequence. In some cases wheresequence 908 is a P5 sequence and sequence 910 is a R1 primer bindingsite, sequence 930 is a P7 sequence and sequence 928 is a R2 primerbinding site and sequence 932 is an i7 sample index sequence. In somecases, prior to removing a portion of the DNA transcripts, the DNAtranscripts can be contacted with an RNase to break down residual RNA.In some cases, operations 981 and 983 can occur in the partition, whileoperations 968, 970, 985 and 987 can occur in bulk (e.g., outside thepartition). In the case where a partition is a droplet in an emulsion,the emulsion can be broken and the contents of the droplet pooled inorder to complete operations 968, 970, 985 and 987.

The approaches of FIGS. 9A-9E may be employed for use with varioustarget regions. In some examples, such target regions are TCR, BCR,and/or immunoglobulin regions. In such examples, oligonucleotidescoupled to beads may include primers with sequences that are targetedfor such target regions (e.g., constant regions). For example, polyTprimer regions can instead be gene specific primer sequences.

Another example of a barcode oligonucleotide for use in RNA analysis,including messenger RNA (mRNA, including mRNA obtained from a cell)analysis is shown in FIG. 10. As shown, the overall oligonucleotide 1002is coupled to a bead 1004 by a releasable linkage 1006, such as adisulfide linker. The oligonucleotide may include functional sequencesthat are used in subsequent processing, such as functional sequence1008, which may include a sequencer specific flow cell attachmentsequence, e.g., a P7 sequence, as well as functional sequence 1010,which may include sequencing primer sequences, e.g., a R2 primer bindingsite. A barcode sequence 1012 is included within the structure for usein barcoding the sample RNA. An RNA specific (e.g., mRNA specific)priming sequence, such as poly-T sequence 1014 may be included in theoligonucleotide structure. An anchoring sequence segment (not shown) maybe included to ensure that the poly-T sequence hybridizes at thesequence end of the mRNA. An additional sequence segment 1016 may beprovided within the oligonucleotide sequence. This additional sequencecan provide a unique molecular identifier (UMI) sequence segment, asdescribed elsewhere herein. An additional functional sequence 1020 maybe included for in vitro transcription, e.g., a T7 RNA polymerasepromoter sequence. As will be appreciated, although shown as a singleoligonucleotide tethered to the surface of a bead, individual beads caninclude tens to hundreds of thousands or even millions of individualoligonucleotide molecules, where, as noted, the barcode segment can beconstant or relatively constant for a given bead, but where the variableor unique sequence segment will vary across an individual bead.

In an example method of cellular RNA analysis and in reference to FIG.10, a cell is co-partitioned along with a barcode bearing bead, andother reagents such as reverse transcriptase, reducing agent and dNTPsinto a partition (e.g., a droplet in an emulsion). In operation 1050,the cell is lysed while the barcoded oligonucleotides 1002 are released(e.g., via the action of the reducing agent) from the bead, and thepoly-T segment 1014 of the released barcode oligonucleotide thenhybridizes to the poly-A tail of mRNA 1020. Next at operation 1052, thepoly-T segment is then extended in a reverse transcription reactionusing the mRNA as template to produce a cDNA transcript 1022 of the mRNAand also includes each of the sequence segments 1020, 1008, 1012, 1010,1016, and 1014 of the barcode oligonucleotide. Within any givenpartition, all of the cDNA transcripts of the individual mRNA moleculeswill include a common barcode sequence segment 1012. However, byincluding the unique random N-mer sequence, the transcripts made fromdifferent mRNA molecules within a given partition will vary at thisunique sequence. As described elsewhere herein, this provides aquantitation feature that can be identifiable even following anysubsequent amplification of the contents of a given partition, e.g., thenumber of unique segments associated with a common barcode can beindicative of the quantity of mRNA originating from a single partition,and thus, a single cell. At operation 1054 a second strand issynthesized and at operation 1056 the T7 promoter sequence can be usedby T7 polymerase to produce RNA transcripts in in vitro transcription.At operation 1058 the transcripts are fragmented (e.g., sheared),ligated to additional functional sequences, and reverse transcribed. Thefunctional sequences may include a sequencer specific flow cellattachment sequence 1030, e.g., a P5 sequence, as well as functionalsequence 1028, which may include sequencing primers, e.g., a R1 primerbinding sequence, as well as functional sequence 1032, which may includea sample index, e.g., an i5 sample index sequence. At operation 1060 theRNA transcripts can be reverse transcribed to DNA, the DNA amplified(e.g., via PCR), and sequenced to identify the sequence of the cDNAtranscript of the mRNA, as well as to sequence the barcode segment andthe unique sequence segment. In some cases, operations 1050 and 1052 canoccur in the partition, while operations 1054, 1056, 1058 and 1060 canoccur in bulk (e.g., outside the partition). In the case where apartition is a droplet in an emulsion, the emulsion can be broken andthe contents of the droplet pooled in order to complete operations 1054,1056, 1058 and 1060.

In an alternative example of a barcode oligonucleotide for use in RNA(e.g., cellular RNA) analysis as shown in FIG. 10, functional sequence1008 may be a P5 sequence and functional sequence 1010 may be a R1primer binding site. Moreover, the functional sequence 1030 may be a P7sequence, functional sequence 1028 may be a R2 primer binding site, andfunctional sequence 1032 may be an i7 sample index sequence.

An additional example of a barcode oligonucleotide for use in RNAanalysis, including messenger RNA (mRNA, including mRNA obtained from acell) analysis is shown in FIG. 11. As shown, the overalloligonucleotide 1102 is coupled to a bead 1104 by a releasable linkage1106, such as a disulfide linker. The oligonucleotide may includefunctional sequences that are used in subsequent processing, such asfunctional sequence 1108, which may include a sequencer specific flowcell attachment sequence, e.g., a P5 sequence, as well as functionalsequence 1110, which may include sequencing primer sequences, e.g., a R1primer binding site. In some cases, sequence 1108 is a P7 sequence andsequence 1110 is a R2 primer binding site. A barcode sequence 1112 isincluded within the structure for use in barcoding the sample RNA. Anadditional sequence segment 1116 may be provided within theoligonucleotide sequence. In some cases, this additional sequence canprovide a unique molecular identifier (UMI) sequence segment, asdescribed elsewhere herein. An additional sequence 1114 may be includedto facilitate template switching, e.g., polyG. As will be appreciated,although shown as a single oligonucleotide tethered to the surface of abead, individual beads can include tens to hundreds of thousands or evenmillions of individual oligonucleotide molecules, where, as noted, thebarcode segment can be constant or relatively constant for a given bead,but where the variable or unique sequence segment will vary across anindividual bead.

In an example method of cellular mRNA analysis and in reference to FIG.11, a cell is co-partitioned along with a microcapsule (e.g., beadbearing a barcoded oligonucleotide), polyT sequence, and other reagentssuch as a DNA polymerase, a reverse transcriptase, oligonucleotideprimers, dNTPs, and reducing agent into a partition (e.g., a droplet inan emulsion). The partition can serve as a reaction volume. As describedelsewhere herein, the partition serving as the reaction volume cancomprise a container or vessel such as a well, a microwell, vial, atube, through ports in nanoarray substrates, or micro-vesicles having anouter barrier surrounding an inner fluid center or core, emulsion, or adroplet. In some embodiments, the partition comprises a droplet ofaqueous fluid within a non-aqueous continuous phase, e.g., an oil phase.Within the partition, the cell can be lysed and the barcodedoligonucleotides can be released from the bead (e.g., via the action ofthe reducing agent or other stimulus). Cell lysis and release of thebarcoded oligonucleotides from the microcapsule may occur simultaneouslyin the partition (e.g., a droplet in an emulsion) or the reactionvolume. In some embodiments, cell lysis precedes release of the barcodedoligonucleotides from the microcapsule. In some embodiments, release ofthe barcoded oligonucleotides from the microcapsule precedes cell lysis.

Subsequent to cell lysis and the release of barcoded oligonucleotidesfrom the microcapsule, the reaction volume can be subjected to anamplification reaction to generate an amplification product. In anexample amplification reaction, the polyT sequence hybridizes to thepolyA tail of mRNA 1120 released from the cell as illustrated inoperation 1150. Next, in operation 1152, the polyT sequence is thenextended in a reverse transcription reaction using the mRNA as atemplate to produce a cDNA transcript 1122 complementary to the mRNA.Terminal transferase activity of the reverse transcriptase can addadditional bases to the cDNA transcript (e.g., polyC) in a templateindependent manner. The additional bases added to the cDNA transcript,e.g., polyC, can then hybridize with 1114 of the barcodedoligonucleotide. This can facilitate template switching and a sequencecomplementary to the barcoded oligonucleotide can be incorporated intothe cDNA transcript. In various embodiments, the barcodedoligonucleotide does not hybridize to the template polynucleotide.

The barcoded oligonucleotide, upon release from the microcapsule, can bepresent in the reaction volume at any suitable concentration. In someembodiments, the barcoded oligonucleotide is present in the reactionvolume at a concentration of about 0.2 μM, 0.3 μM, 0.4 μM, 0.5 μM, 1 μM,5 μM, 10 μM, 15 μM, 20 μM, 25 μM, 30 μM, 35 μM, 40 μM, 50 μM, 100 μM,150 μM, 200 μM, 250 μM, 300 μM, 400 μM, or 500 μM. In some embodiments,the barcoded oligonucleotide is present in the reaction volume at aconcentration of at least about 0.2 μM, 0.3 μM, 0.4 μM, 0.5 μM, 1 μM, 5μM, 10 μM, 15 μM, 20 μM, 25 μM, 30 μM, 35 μM, 40 μM, 50 μM, 100 μM, 150μM, 200 μM, 250 μM, 300 μM, 400 μM, 500 μM or greater. In someembodiments, the barcoded oligonucleotide is present in the reactionvolume at a concentration of at most about 0.2 μM, 0.3 μM, 0.4 μM, 0.5μM, 1 μM, 5 μM, 10 μM, 15 μM, 20 μM, 25 μM, 30 μM, 35 μM, 40 μM, 50 μM,100 μM, 150 μM, 200 μM, 250 μM, 300 μM, 400 μM, or 500 μM.

The transcripts can be further processed (e.g., amplified, portionsremoved, additional sequences added, etc.) and characterized asdescribed elsewhere herein. In some embodiments, the transcripts aresequenced directly. In some embodiments, the transcripts are furtherprocessed (e.g., portions removed, additional sequences added, etc) andthen sequenced. In some embodiments, the reaction volume is subjected toa second amplification reaction to generate an additional amplificationproduct. The transcripts or first amplification products can be used asthe template for the second amplification reaction. In some embodiments,primers for the second amplification reaction comprise the barcodedoligonucleotide and polyT sequence. In some embodiments, primers for thesecond amplification reaction comprise additional primers co-partitionedwith the cell. In some embodiments, these additional amplificationproducts are sequenced directly. In some embodiments, these additionalamplification products are further processed (e.g., portions removed,additional sequences added, etc) and then sequenced. The configurationof the amplification products (e.g., first amplification products andsecond amplification products) generated by such a method can helpminimize (or avoid) sequencing of the poly-T sequence during sequencing.

An additional example of a barcode oligonucleotide for use in RNAanalysis, including cellular RNA analysis is shown in FIG. 12A. Asshown, the overall oligonucleotide 1202 is coupled to a bead 1204 by areleasable linkage 1206, such as a disulfide linker. The oligonucleotidemay include functional sequences that are used in subsequent processing,such as functional sequence 1208, which may include a sequencer specificflow cell attachment sequence, e.g., a P5 sequence, as well asfunctional sequence 1210, which may include sequencing primer sequences,e.g., a R1 primer binding site. In some cases, sequence 1208 is a P7sequence and sequence 1210 is a R2 primer binding site. A barcodesequence 1212 is included within the structure for use in barcoding thesample RNA. An additional sequence segment 1216 may be provided withinthe oligonucleotide sequence. In some cases, this additional sequencecan provide a unique molecular identifier (UMI) sequence segment, asdescribed elsewhere herein. As will be appreciated, although shown as asingle oligonucleotide tethered to the surface of a bead, individualbeads can include tens to hundreds of thousands or even millions ofindividual oligonucleotide molecules, where, as noted, the barcodesegment can be constant or relatively constant for a given bead, butwhere the variable or unique sequence segment will vary across anindividual bead. In an example method of cellular RNA analysis usingthis barcode, a cell is co-partitioned along with a barcode bearing beadand other reagents such as RNA ligase and a reducing agent into apartition (e.g., a droplet in an emulsion). The cell is lysed while thebarcoded oligonucleotides are released (e.g., via the action of thereducing agent) from the bead. The barcoded oligonucleotides can then beligated to the 5′ end of mRNA transcripts while in the partitions by RNAligase. Subsequent operations may include purification (e.g., via solidphase reversible immobilization (SPRI)) and further processing(shearing, ligation of functional sequences, and subsequentamplification (e.g., via PCR)), and these operations may occur in bulk(e.g., outside the partition). In the case where a partition is adroplet in an emulsion, the emulsion can be broken and the contents ofthe droplet pooled for the additional operations.

An additional example of a barcode oligonucleotide for use in RNAanalysis, including cellular RNA analysis is shown in FIG. 12B. Asshown, the overall oligonucleotide 1222 is coupled to a bead 1224 by areleasable linkage 1226, such as a disulfide linker. The oligonucleotidemay include functional sequences that are used in subsequent processing,such as functional sequence 1228, which may include a sequencer specificflow cell attachment sequence, e.g., a P5 sequence, as well asfunctional sequence 1230, which may include sequencing primer sequences,e.g., a R1 primer binding site. In some cases, sequence 1228 is a P7sequence and sequence 1230 is a R2 primer binding site. A barcodesequence 1232 is included within the structure for use in barcoding thesample RNA. A priming sequence 1234 (e.g., a random priming sequence)can also be included in the oligonucleotide structure, e.g., a randomhexamer. An additional sequence segment 1236 may be provided within theoligonucleotide sequence. In some cases, this additional sequenceprovides a unique molecular identifier (UMI) sequence segment, asdescribed elsewhere herein. As will be appreciated, although shown as asingle oligonucleotide tethered to the surface of a bead, individualbeads can include tens to hundreds of thousands or even millions ofindividual oligonucleotide molecules, where, as noted, the barcodesegment can be constant or relatively constant for a given bead, butwhere the variable or unique sequence segment will vary across anindividual bead. In an example method of cellular mRNA analysis usingthe barcode oligonucleotide of FIG. 12B, a cell is co-partitioned alongwith a barcode bearing bead and additional reagents such as reversetranscriptase, a reducing agent and dNTPs into a partition (e.g., adroplet in an emulsion). The cell is lysed while the barcodedoligonucleotides are released from the bead (e.g., via the action of thereducing agent). In some cases, sequence 1228 is a P7 sequence andsequence 1230 is a R2 primer binding site. In other cases, sequence 1228is a P5 sequence and sequence 1230 is a R1 primer binding site. Thepriming sequence 1234 of random hexamers can randomly hybridize cellularmRNA. The random hexamer sequence can then be extended in a reversetranscription reaction using mRNA from the cell as a template to producea cDNA transcript complementary to the mRNA and also includes each ofthe sequence segments 1228, 1232, 1230, 1236, and 1234 of the barcodeoligonucleotide. Subsequent operations may include purification (e.g.,via solid phase reversible immobilization (SPRI)), further processing(shearing, ligation of functional sequences, and subsequentamplification (e.g., via PCR)), and these operations may occur in bulk(e.g., outside the partition). In the case where a partition is adroplet in an emulsion, the emulsion can be broken and the contents ofthe droplet pooled for additional operations. Additional reagents thatmay be co-partitioned along with the barcode bearing bead may includeoligonucleotides to block ribosomal RNA (rRNA) and nucleases to digestgenomic DNA and cDNA from cells. Alternatively, rRNA removal agents maybe applied during additional processing operations. The configuration ofthe constructs generated by such a method can help minimize (or avoid)sequencing of the poly-T sequence during sequencing.

The single cell analysis methods described herein may also be useful inthe analysis of the whole transcriptome. Referring back to the barcodeof FIG. 12B, the priming sequence 1234 may be a random N-mer. In somecases, sequence 1228 is a P7 sequence and sequence 1230 is a R2 primerbinding site. In other cases, sequence 1228 is a P5 sequence andsequence 1230 is a R1 primer binding site. In an example method of wholetranscriptome analysis using this barcode, the individual cell isco-partitioned along with a barcode bearing bead, poly-T sequence, andother reagents such as reverse transcriptase, polymerase, a reducingagent and dNTPs into a partition (e.g., droplet in an emulsion). In anoperation of this method, the cell is lysed while the barcodedoligonucleotides are released from the bead (e.g., via the action of thereducing agent) and the poly-T sequence hybridizes to the poly-A tail ofcellular mRNA. In a reverse transcription reaction using the mRNA astemplate, cDNA transcripts of cellular mRNA can be produced. The RNA canthen be degraded with an RNase. The priming sequence 1234 in thebarcoded oligonucleotide can then randomly hybridize to the cDNAtranscripts. The oligonucleotides can be extended using polymeraseenzymes and other extension reagents co-partitioned with the bead andcell similar to as shown in FIG. 3 to generate amplification products(e.g., barcoded fragments), similar to the example amplification productshown in FIG. 3 (panel F). The barcoded nucleic acid fragments may, insome cases subjected to further processing (e.g., amplification,addition of additional sequences, clean up processes, etc. as describedelsewhere herein) characterized, e.g., through sequence analysis. Inthis operation, sequencing signals can come from full length RNA.

In some embodiments, the barcode sequence can be appended to the 3′ endof the template polynucleotide sequence (e.g., mRNA). Such configurationmay be desired, for example, if the sequence at the 3′ end of thetemplate polynucleotide is desired to be analyzed.

In some embodiments, the barcode sequence can be appended to the 5′ endof a template polynucleotide sequence (e.g., mRNA). Such configurationmay be desired, for example, if the sequence at the 5′ end of thetemplate polynucleotide is desired to be analyzed.

In some embodiments, a barcode sequence can be appended to the 3′ end ofa first subset of the template polynucleotides, and a barcode sequencecan be appended to the 5′ end of a second subset of the templatepolynucleotides. In some embodiments, the first subset of templatepolynucleotides and the second subset of template polynucleotides areappended to barcode sequences in the same partition. In some cases, thebarcodes appended to the 3′ ends of template polynucleotides aredifferent from the barcodes appended to the 5′ ends of templatepolynucleotides. For example, the barcodes appended to the 3′ ends mayhave a different barcode sequence compared to the barcodes appended tothe 5′ end. In some cases, the barcodes appended to the 3′ ends oftemplate polynucleotides have the same barcode sequence as the barcodesappended to the 5′ ends of template polynucleotides. In some cases,beads are used to deliver the barcode oligonucleotides to partitions.The different barcodes can be attached to the same or different bead.

A barcode sequence can be appended to the 5′ end of a templatepolynucleotide sequence by any suitable method. In some cases, thetemplate polynucleotide is a messenger RNA, mRNA, molecule. The barcodesequence can be appended to the 5′ end of a template polynucleotidesequence by use of a primer comprising the barcode sequence in a primerextension reaction. For example, the barcode may be present in a primerused for a primer extension reaction in which the templatepolynucleotide or a derivative thereof, for example an amplificationproduct, is used as the template for primer extension. In some cases,the barcode may be present on a template switching oligonucleotideparticipating in a primer extension reaction. As an alternative, thebarcode sequence can be appended to the 5′ end of a templatepolynucleotide by ligating an oligonucleotide comprising the barcodesequence directly to the template polynucleotide.

In another aspect, the present disclosure provides a method of appendinga barcode sequence to the 5′ end of a template polynucleotide sequenceby a primer extension reaction using a primer comprising a barcodesequence and the template polynucleotide or a derivative thereof as thetemplate for primer extension. The primer extension reaction may occurin a partition. In some embodiments, a cell, or a nucleic acidderivative thereof, is co-partitioned with a primer capable of primerextension and a template switching oligo comprising a barcode sequence.The primer capable of primer extension may hybridize to a nucleic acidof the cell or to a nucleic acid derivative. In some cases, the templateswitching oligo comprising the barcode sequence is releasably attachedto a bead, e.g., a gel bead. In some embodiments, a cell, or a nucleicacid derivative thereof, is co-partitioned with a primer having asequence towards a 3′ end that hybridizes to the templatepolynucleotide, a template switching oligonucleotide having a firstpredefined sequence towards a 5′ end, and a microcapsule, such as abead, having barcoded oligonucleotides releasably coupled thereto. Insome embodiments, the oligonucleotides coupled to the bead includebarcode sequences that are identical (e.g., all oligonucleotides sharingthe same barcode sequence). In some aspects, the oligonucleotidescoupled to the beads additionally include unique molecular identifier(UMI) sequence segments (e.g., all oligonucleotides having differentunique molecular identifier sequences).

In an example, FIG. 18 shows a barcoded oligonucleotide coupled to abead. As shown, the overall oligonucleotide 1802 is coupled to a bead1804 by a releasable linkage 1806, such as a disulfide linker. Theoligonucleotide may include functional sequences that are useful forsubsequent processing, such as functional sequence 1808, which mayinclude a sequencer specific flow cell attachment sequence, e.g., a P5sequence, as well as functional sequence 1810, which may includesequencing primer sequences, e.g., a R1 primer binding site. In somecases, sequence 1808 is a P7 sequence and sequence 1810 is a R2 primerbinding site. A barcode sequence 1812 can be included within thestructure for use in barcoding the template polynucleotide. Thefunctional sequences may be selected for compatibility with a variety ofdifferent sequencing systems, e.g., 454 Sequencing, Ion Torrent Protonor PGM, Illumina X10, etc., and the requirements thereof. In many cases,the barcode sequence 1812, functional sequences 1808 (e.g., flow cellattachment sequence) and 1810 (e.g., sequencing primer sequences) may becommon to all of the oligonucleotides attached to a given bead. Thebarcoded oligonucleotide can also comprise a sequence 1816 to facilitatetemplate switching (e.g., a polyG sequence). In some cases, theadditional sequence provides a unique molecular identifier (UMI)sequence segment, as described elsewhere herein. The one or morefunctional sequences that may be present in an oligonucleotide can bearranged in any suitable order.

Although shown as a single oligonucleotide tethered to the surface of abead, individual beads can include tens to hundreds of thousands or evenmillions of individual oligonucleotide molecules, where, as previouslynoted herein, the barcode sequence can be constant or relativelyconstant for a given bead.

In an example method of generating labeled polynucleotides using abarcode oligonucleotide, a cell or a nucleic acid derived therefrom isco-partitioned with a bead bearing a barcoded oligonucleotide andreagents such as reverse transcriptase, poly-T primers, dNTPs, and achemical stimulus (e.g., reducing agent) into a partition. The barcodedoligonucleotide attached to the bead can comprise a sequence tofacilitate template switching (e.g., polyG or riboG). The partition canbe a droplet in an emulsion. In cases where a cell is provided in thepartition, the partition can further comprise a lysis reagent to lysethe cell.

Where the bead is a degradable or disruptable bead, the barcodedoligonucleotide can be released from the bead when contacted with thechemical stimulus (e.g., reducing agent). Following release from thebead, the barcoded oligonucleotide can be present in the partition atany suitable concentration. In some embodiments, the barcodedoligonucleotide is present in the partition at a concentration that issuitable for generating a sufficient yield of amplification products fordownstream processing and analysis, including, but not limited to,sequencing adaptor attachment and sequencing analysis.

With reference to FIG. 19A, in 1901A, an oligonucleotide with a poly-Tsequence 1914A, and in some cases an additional sequence 1916A thatbinds to, for example, a sequencing or PCR primer, anneals to a targetmRNA 1920A. In 1902A, the oligonucleotide is extended yielding ananti-sense strand 1922A which is appended by multiple cytidines on the3′ end. In 1903A, the template switching sequence 1990A (e.g., polyG orriboG) of the barcoded oligonucleotide pairs with the cytidines of theanti-sense strand 1922A and the anti-sense strand is extended using thebarcoded oligonucleotide as template. In addition to the riboG sequence,the barcoded oligonucleotide can comprise additional functionalsequences 1908A, 1912A, and 1910A. In some cases, the barcodedoligonucleotide comprises a unique molecular identifier (UMI, forexample 1908A), a barcode sequence (for example 1912A), and a Read 1sequence (R1, for example 1910A). Operations 1901A, 1902A, and 1903A maybe performed in the partition (e.g., droplet or well). The extension in1902A and 1903A can be facilitated by an enzyme comprising polymeraseactivity. For example, the extension can be facilitated by aDNA-dependent polymerase or a reverse-transcriptase (e.g., RNAdependent). In some embodiments, the extension comprises polymerasechain reaction. In some embodiments, the extension comprises reversetranscription. The enzyme can add nucleotides in a template independentmanner. In some cases, at least three cytidines are appended to the 3′end of the cDNA transcript in a template independent manner.

Subsequent to 1903A, the nucleic acid product (e.g., cDNA product) maybe released from the partition and subject to further processingreactions such as additional amplification. In some cases, the nucleicacid product is pooled with products from other partitions forsubsequent processing in bulk. In some cases, a portion of the amplifiedproduct can be subjected to enrichment to obtain a subset of nucleicacids corresponding to genes of interest.

In some cases, enrichment to obtain a subset of nucleic acidscorresponding to genes of interest comprises one or more amplificationreactions. One or more gene specific primers can be used for primerextension using the cDNA molecule as a template. Any of a variety ofpolymerases can be used in embodiments herein for primer extension,non-limiting examples of which include exonuclease minus DNA PolymeraseI large (Klenow) Fragment, Phi29 DNA polymerase, Taq DNA Polymerase, T4DNA polymerase, T7 DNA polymerase, and the like. Further examples ofpolymerase enzymes that can be used in embodiments herein includethermostable polymerases, including but not limited to, Thermusthermophilus HB8; Thermus oshimai; Thermus scotoductus; Thermusthermophilus 1B21; Thermus thermophilus GK24; Thermus aquaticuspolymerase AmpliTaq® FS or Taq (G46D; F667Y), Taq (G46D; F667Y; E6811),and Taq (G46D; F667Y; T664N; R660G); Pyrococcus furiosus polymerase;Thermococcus gorgonarius polymerase; Pyrococcus species GB-D polymerase;Thermococcus sp. (strain 9 deg. N-7) polymerase; Bacillus stearothermophilus polymerase; Tsp polymerase; Thermus flavus polymerase;Thermus litoralis polymerase; Thermus Z05 polymerase; delta Z05polymerase (e.g. delta Z05 Gold DNA polymerase); and mutants, variants,or derivatives thereof. In some embodiments, a hot start polymerase isused. A hot start polymerase is a modified form of a DNA polymerase thatcan be activated by incubation at elevated temperatures.

Additional functional sequences can be added to the nucleic acid productor an amplification product thereof. The additional functional sequencesmay allow for amplification or sample identification. This may occur inthe partition or, alternatively, in bulk. In some cases, theamplification products can be sheared, ligated to adapters and amplifiedto add additional functional sequences. In some cases, both the enrichedand unenriched amplification products are subject to analysis.

In an example method of cellular polynucleotide analysis using thebarcode oligonucleotide of FIG. 18, a cell is co-partitioned along witha bead bearing a barcoded oligonucleotide and additional reagents suchas reverse transcriptase, primers, oligonucleotides (e.g., templateswitching oligonucleotides), dNTPs, and reducing agent into a partition(e.g., a droplet in an emulsion). Within the partition, the cell can belysed to yield a plurality of template polynucleotides (e.g., DNA suchas genomic DNA, RNA such as mRNA, etc). In some cases, the cell is lysedusing a lysis reagent that is co-partitioned with the cell.

Where the bead is a degradable or disruptable bead, the barcodedoligonucleotide can be released from the bead following the applicationof stimulus as previously described herein. Following release from thebead, the barcoded oligonucleotide can be present in the partition atany suitable concentration. In some embodiments, the barcodedoligonucleotide is present in the partition at a concentration that issuitable for generating a sufficient yield of amplification products fordownstream processing and analysis, including, but not limited to,sequencing adaptor attachment and sequencing analysis. In someembodiments, the concentration of the barcoded oligonucleotide islimited by the loading capacity of the barcode bearing bead, or theamount of oligonucleotides deliverable by the bead.

The template switching oligonucleotide, which can be co-partitioned withthe cell, bead bearing barcoded oligonucleotides, etc, can be present inthe partition at any suitable concentration. In some embodiments, thetemplate switching oligonucleotide is present in the partition at aconcentration that is suitable for efficient template switching duringan amplification reaction. The concentration of the template switchingoligonucleotide can be dependent on the reagents used for dropletgeneration. In some embodiments, the template switching oligonucleotideis among a plurality of template switching oligonucleotides.

In some embodiments, the barcoded oligonucleotide and template switchingoligonucleotide are present in the partition at similar concentrations.In some embodiments, the barcoded oligonucleotide and template switchingoligonucleotides may be present in proportions reflective of the desiredamount of amplification products to be generated using eacholigonucleotide. In some embodiments, the template switchingoligonucleotide is present in the partition at a greater concentrationthan the barcoded oligonucleotide. This difference in concentration canbe due to limitations on the capacity of the barcode bearing bead. Insome embodiments, the concentration of the template switchingoligonucleotide in the reaction volume is at least 2, 5, 10, 20, 50,100, 200 or more times that of the concentration of the barcodedoligonucleotide in the same reaction volume when the barcodedoligonucleotide is free in the partition (e.g., not attached to thebead).

As illustrated in FIG. 19B, a reaction mixture comprising a templatepolynucleotide from a cell 1920B and (i) the primer 1924B having asequence towards a 3′ end that hybridizes to the template polynucleotide(e.g., polyT) and an additional sequence element 1900B and (ii) atemplate switching oligonucleotide 1926B that comprises a firstpredefined sequence 1810 towards a 5′ end can be subjected to anamplification reaction to yield a first amplification product. In somecases, the template polynucleotide is an mRNA with a polyA tail and theprimer that hybridizes to the template polynucleotide comprises a polyTsequence towards a 3′ end, which is complementary to the polyA segment.The first predefined sequence can comprise at least one of an adaptorsequence, a barcode sequence, a unique molecular identifier (UMI)sequence, a primer binding site, and a sequencing primer binding site orany combination thereof. In some cases, the first predefined sequence1810 is a sequence that can be common to all partitions of a pluralityof partitions. For example, the first predefined sequence may comprise aflow cell attachment sequence, an amplification primer binding site, ora sequencing primer binding site and the first amplification reactionfacilitates the attachment the predefined sequence to the templatepolynucleotide from the cell. In some embodiments, the first predefinedsequence comprises a primer binding site. In some embodiments, the firstpredefined sequence comprises a sequencing primer binding site. In someembodiments, the first predefined sequence comprises a barcode sequence.As illustrated in operation 1950B, the sequence towards a 3′ end (e.g.,polyT) of the primer 1924B hybridizes to the template polynucleotide1920B. In a first amplification reaction, extension reaction reagents,e.g., reverse transcriptase, nucleoside triphosphates, co-factors (e.g.,Mg2+ or Mn2+), that are also co-partitioned, can extend the primer 1924Bsequence using the cell's nucleic acid as a template, to produce atranscript, e.g., cDNA transcript, 1922B having a fragment complementaryto the nucleic acid to which the primer annealed. In some cases, thereverse transcriptase has terminal transferase activity and the reversetranscriptase adds additional nucleotides, e.g., polyC, to the cDNAtranscript in a template independent manner. As illustrated in operation1952B, the template switching oligonucleotide 1926B, for example atemplate switching oligonucleotide which includes a polyG sequence, canhybridize to the cDNA transcript 1922B and facilitate template switchingin the first amplification reaction. The transcript, therefore, maycomprise the sequence of the primer 1924B, a sequence complementary tothe template polynucleotide from the cell, and a sequence complementaryto the template switching oligonucleotide.

Among a plurality of partitions, the primer and template switchingoligonucleotide may be universal to all partitions. The partitions mayindividually contain more than one cell, one cell, no cells, or nucleicacids derived from a cell. Where analysis of mRNA is desired, forexample, the primer may comprise at least a polyT segment capable ofhybridizing and priming an extension reaction from the polyA segment ofan mRNA. Where analysis of a variety of polynucleotides is desired, theprimer may comprise a random sequence capable of hybridizing to andpriming extension reactions randomly on various polynucleotidetemplates. As template switching can occur with the use of an enzymehaving terminal transferase activity, a template switchingoligonucleotide having a sequence capable of hybridizing to the appendedbases can be used for template switching in manner that is independentof the sequence of the polynucleotide templates to be analyzed. In someembodiments, the template switching oligonucleotide can comprise a firstpredefined sequence towards a 5′ end that does not specificallyhybridize to the template. In some embodiments, analysis of particulargenes is desired. In such cases, the primer may comprise a gene specificsequence capable of hybridizing to and priming extension reactions fromtemplates comprising specific genes. In some embodiments, multiple genesare to be analyzed and a primer is among a plurality of primers.Individual primers of the plurality may target different genes. Each ofthe plurality of primers may have a sequence for a particular gene.

Subsequent to the first amplification reaction, the first amplificationproduct or transcript can be subjected to a second amplificationreaction to generate a second amplification product. In some cases,additional sequences (e.g., functional sequences such as flow cellattachment sequence, sequencing primer binding sequences, barcodesequences, etc) are to be attached. The first and second amplificationreactions can be performed in the same volume, such as for example in adroplet or well. In some cases, the first amplification product issubjected to a second amplification reaction in the presence of abarcoded oligonucleotide to generate a second amplification producthaving a barcode sequence. The barcode sequence can be unique to apartition, that is, each partition has a unique barcode sequence. Thebarcoded oligonucleotide may comprise a sequence of at least a segmentof the template switching oligonucleotide and at least a secondpredefined sequence. The segment of the template switchingoligonucleotide on the barcoded oligonucleotide can facilitatehybridization of the barcoded oligonucleotide to the transcript, e.g.,cDNA transcript, to facilitate the generation of a second amplificationproduct. In addition to a barcode sequence, the barcoded oligonucleotidemay comprise a second defined sequence such as at least one of anadaptor sequence, a unique molecular identifier (UMI) sequence, a primerbinding site, and a sequencing primer binding site or any combinationthereof.

In some embodiments, the second amplification reaction uses the firstamplification product as a template and the barcoded oligonucleotide asa primer. As illustrated in operation 1954B, the segment of the templateswitching oligonucleotide on the barcoded oligonucleotide 1928B canhybridize to the portion of the cDNA transcript or complementaryfragment 1922B having a sequence complementary to the template switchingoligonucleotide or that which was copied from the template switchingoligonucleotide. In the second amplification reaction, extensionreaction reagents, e.g., polymerase, nucleoside triphosphates,co-factors (e.g., Mg2+ or Mn2+), that are also co-partitioned, canextend the primer sequence using the first amplification product astemplate as illustrated in operation 1956B. The second amplificationproduct can comprise a second predefined sequence (e.g., 1808, 1812, and1810), a sequence of a segment of the template polynucleotide (e.g.,mRNA), and a sequence complementary to the primer (e.g., 1924B). Incases where the template polynucleotide is an mRNA molecule,amplification products derived therefrom can comprise the correspondingDNA sequence, for example thymine instead of uracil bases.

In some embodiments, the second amplification product uses the barcodedoligonucleotide as a template and at least a portion of the firstamplification product as a primer. As illustrated in operation 1954B,the segment of the first amplification product (e.g., cDNA transcript)having a sequence complementary to the template switchingoligonucleotide can hybridize to the segment of the barcodedoligonucleotide comprising a sequence of at least a segment of thetemplate switching oligonucleotide. In the second amplificationreaction, extension reaction reagents, e.g., polymerase, nucleosidetriphosphates, co-factors (e.g., Mg2+ or Mn2+), that are alsoco-partitioned, can extend the primer sequence (e.g., firstamplification product) using the barcoded oligonucleotide as template asillustrated in operation 1958B. The second amplification product maycomprise the sequence of the primer (e.g., 1924B), a sequence which iscomplementary to the sequence of the template polynucleotide (e.g.,mRNA), and a sequence complementary to the second predefined sequence(e.g., 1808, 1812, and 1810).

In some embodiments, the second amplification reaction is performedsubsequent to the first amplification reaction in the presence of anintervening purification step. An intervening purification step can beused, for example, to purify the template (e.g., first amplificationproduct) from excess reagents, including excess primers such as templateswitching oligonucleotides. In some embodiments, the amplificationreaction is performed in the absence of an intervening purificationstep. In certain embodiments, an intervening purification step is notperformed so that all sample preparation is performed in a same reactionvolume. In the absence of an intervening purification step, the templateswitching oligonucleotide may compete with barcoded oligonucleotide inthe second amplification reaction as the barcoded oligonucleotidecomprises at least a segment of the template switching oligonucleotide.Competition between the template switching oligonucleotide and barcodedoligonucleotide in the second amplification reaction to generateadditional amplification product may result in a second amplificationproduct lacking a barcode sequence. Such amplification products lackinga barcode sequence may be undesirable as they lack a barcode sequencewhich can provide unique identifying information of the template. Insome embodiments, the template switching oligonucleotide may out-competethe barcoded oligonucleotide in the second amplification reaction if thetemplate switching oligonucleotide is present at a higher concentrationin the reaction volume than the barcoded oligonucleotide. Variousapproaches can be utilized to favor the use of the barcodedoligonucleotide in the second amplification reaction to generateamplification products having a barcode sequence in situations where thebarcoded oligonucleotide is present at a lower concentration than thetemplate switching oligonucleotide in the reaction volume.

In some embodiments, the template switching oligonucleotide is notavailable for primer extension during the second amplification reaction.In some embodiments, the template switching oligonucleotide is degradedprior to the second amplification reaction. In some embodiments, thetemplate switching oligonucleotide is degraded during the secondamplification reaction. The template switching oligonucleotide maycomprise ribonucleic acids (RNA). A template switching oligonucleotidecomprising RNA can be degraded, for example, by elevated temperatures oralkaline conditions. In some embodiments, the template switchingoligonucleotide comprises at least 10%, 15%, 20%, 25%, 30%, 35%, 40%,45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95% RNA. In someembodiments, the template switching oligonucleotide comprises 100% RNA.In some embodiments, a first reaction rate of the second amplificationreaction using the barcoded oligonucleotide is greater than a secondreaction rate of the second amplification using the template switchingoligonucleotide.

In some embodiments, the barcoded oligonucleotide can hybridize to thefirst amplification product at a higher annealing temperature ascompared to the template switching oligonucleotide. For example, thefirst amplification product and the barcoded oligonucleotide can have ahigher melting temperature as compared to a melting temperature of thefirst amplification product and the template switching oligonucleotide.In such cases, the second amplification reaction may be performed withan annealing temperature at which the barcoded oligonucleotide is ableto hybridize to the first amplification product and initiation primerextension and at which the template switching oligonucleotide is unableto hybridize to the first amplification product and initiate primerextension. In some embodiments, the primer annealing temperature of thesecond amplification reaction is at least about 0.5° C., 1° C., 2° C.,3° C., 4° C., 5° C., 6° C., 7° C., 8° C., 9° C., 10° C. or greater thana primer annealing temperature of the first amplification reaction. Thedifference in melting temperatures can result from the presence ofmodified nucleotides in the template switching oligonucleotide. In someembodiment, the template switching oligonucleotide comprises at least10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%,80%, 85%, 90%, or 95% modified nucleotides. In some embodiments, thetemplate switching oligonucleotide comprises 100% modifiedoligonucleotides. In some embodiments, the difference in meltingtemperature can be the result of the presence of modified nucleotides inthe barcoded oligonucleotide. In some embodiment, the barcodedoligonucleotide comprises at least 10%, 15%, 20%, 25%, 30%, 35%, 40%,45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95% modifiednucleotides. In some embodiments, the barcoded oligonucleotide comprises100% modified oligonucleotides. Modified nucleotides include, but arenot limited to, 2-Aminopurine, 2,6-Diaminopurine (2-Amino-dA), inverteddT, 5-Methyl dC, 2′-deoxylnosine, Super T(5-hydroxybutynl-2′-deoxyuridine), Super G (8-aza-7-deazaguanosine),locked nucleic acids (LNAs), unlocked nucleic acids (UNAs, e.g., UNA-A,UNA-U, UNA-C, UNA-G), Iso-dG, Iso-dC, and 2′ Fluoro bases (e.g., FluoroC, Fluoro U, Fluoro A, and Fluoro G).

In various embodiments, the first amplification reaction is facilitatedusing an enzyme comprising polymerase activity. For example, the firstamplification reaction can be facilitated by a DNA-dependent polymeraseor a reverse-transcriptase (e.g., RNA dependent). In some embodiments,the first amplification reaction comprises polymerase chain reaction. Insome embodiments, the first amplification reaction comprises reversetranscription. In various embodiments, the second amplification reactionis facilitated using an enzyme comprising polymerase activity. Forexample, the second amplification reaction can be facilitated by aDNA-dependent polymerase. In some embodiments, the second amplificationreaction comprises polymerase chain reaction.

In another aspect, a template polynucleotide comprising mRNA may firstbe reverse transcribed to cDNA (e.g., an amplification product of thetemplate polynucleotide). The mRNA molecule can be reverse transcribedto cDNA using a reverse transcriptase enzyme and a primer, such as apoly-T primer. Non-limiting examples of enzymes that can be used forreverse transcription in embodiments herein include HIV-1 reversetranscriptase, M-MLV reverse transcriptase, AMV reverse transcriptase,telomerase reverse transcriptase, and variants, modified products andderivatives thereof.

A gene specific primer having a barcode sequence can then be used forprimer extension using the cDNA molecule (e.g., amplification product ofthe template polynucleotide) as a template. A primer comprising abarcode can hybridize to the cDNA molecule via sequence complementarity.Extension of the primer using the cDNA molecule as template may resultin a polynucleotide product comprising the template polynucleotidesequence and the barcode sequence located at the 5′ end of the templatepolynucleotide sequence. Any of a variety of polymerases can be used inembodiments herein for primer extension, non-limiting examples of whichinclude exonuclease minus DNA Polymerase I large (Klenow) Fragment,Phi29 DNA polymerase, Taq DNA Polymerase, T4 DNA polymerase, T7 DNApolymerase, and the like. Further examples of polymerase enzymes thatcan be used in embodiments herein include thermostable polymerases,including but not limited to, Thermus thermophilus HB8; Thermus oshimai;Thermus scotoductus; Thermus thermophilus 1B21; Thermus thermophilusGK24; Thermus aquaticus polymerase AmpliTaq® FS or Taq (G46D; F667Y),Taq (G46D; F667Y; E6811), and Taq (G46D; F667Y; T664N; R660G);Pyrococcus furiosus polymerase; Thermococcus gorgonarius polymerase;Pyrococcus species GB-D polymerase; Thermococcus sp. (strain 9 deg. N-7)polymerase; Bacillus stearo thermophilus polymerase; Tsp polymerase;Thermus flavus polymerase; Thermus litoralis polymerase; Thermus Z05polymerase; delta Z05 polymerase (e.g. delta Z05 Gold DNA polymerase);and mutants, variants, or derivatives thereof. In some embodiments, ahot start polymerase is used. A hot start polymerase is a modified formof a DNA polymerase that can be activated by incubation at elevatedtemperatures. Such a polymerase can be used, for example, to furtherincrease sensitivity, specificity, and yield; and/or to further improvelow copy target amplification.

In another aspect, a barcode sequence is appended to the 5′ end of atemplate polynucleotide sequence by ligating an oligonucleotidecomprising a barcode sequence directly to the 5′ end of the templatepolynucleotide. Ligating an oligonucleotide comprising a barcodesequence to a template polynucleotide can be implemented by variousmethods. In some embodiments herein, ligating an oligonucleotidecomprising a barcode sequence to a template polynucleotide involves anenzyme, such as a ligase (e.g., an RNA ligase or a DNA ligase).Non-limiting examples of enzymes that can be used for ligation inembodiments herein include ATP-dependent double-stranded polynucleotideligases, NAD+ dependent DNA or RNA ligases, and single-strandpolynucleotide ligases. Non-limiting examples of ligases which can beused in embodiments herein include CircLigase I and CircLigase II(Epicentre; Madison, Wis.), Escherichia coli DNA ligase, Thermusfiliformis DNA ligase, Tth DNA ligase, Thermus scotoductus DNA ligase (Iand II), T3 DNA ligase, T4 DNA ligase, T4 RNA ligase, T7 DNA ligase, Taqligase, Ampligase (Epicentre® Technologies Corp.), VanC-type ligase, 9°N DNA Ligase, Tsp DNA ligase, DNA ligase I, DNA ligase III, DNA ligaseIV, Sso7-T3 DNA ligase, Sso7-T4 DNA ligase, Sso7-T7 DNA ligase, Sso7-TaqDNA ligase, Sso7-E. coli DNA ligase, Sso7-Ampligase DNA ligase, andthermostable ligases. Ligase enzymes may be wild-type, mutant isoforms,and genetically engineered variants.

In some embodiments where a barcode oligonucleotide is ligated to atemplate polynucleotide comprising mRNA, the mRNA molecule can betreated to yield a 5′ monophosphate group prior to ligating. Anysuitable reaction may be employed to yield a 5′ monophosphate group. Forexample, the mRNA molecule can be treated with an enzyme such as apyrophosphohydrolase. An example of a pyrophosphohydrolase that can beused in embodiments herein is RNA 5′ phyrophosphohydrolase (RppH). Insome cases, all of the phosphate groups at the 5′ end of the moleculeare removed and a single phosphate groups is added back to the 5′ end.In some cases, two phosphate groups are removed from a triphosphategroup to yield a monophosphate. In some cases, a single enzyme bothremoves the phosphate groups present on the mRNA molecule and adds themonophosphate group. In some cases, a first enzyme removes the phosphategroups present on the mRNA molecule and a second enzyme adds themonophosphate group. In some cases, the phosphate groups are removedfrom the 5′ end of the mRNA molecule and the 5′ end is adenylated. Anenzyme which can be used for 5′ adenylation in embodiments hereinincludes Mth RNA ligase.

In some cases, the oligonucleotide comprising the barcode sequence isligated to the template polynucleotide within a partition (e.g., dropletor well). A partition, in some cases, comprises a polynucleotide samplecomprising the template polynucleotide, an oligonucleotide having thebarcode sequence, a ligase enzyme, and any other suitable reagents forligation. The ligase can implement the attachment of the oligonucleotidecomprising the barcode sequence to the template polynucleotide withinthe partition. In some cases, the template polynucleotide is an mRNAmolecule and the oligonucleotide ligated to it is a DNA molecule. Insome cases, the oligonucleotide comprising the barcode sequence isligated to the template polynucleotide outside of a partition.

Following the attachment of an oligonucleotide comprising a barcodesequence to the 5′ end of a template polynucleotide, for example an mRNApolynucleotide, the barcoded template can be subjected to furtheramplification. In some cases, one or more further amplificationreactions are performed within the partition. In some cases, one or morefurther amplification reactions are performed outside of a partition. Insome cases, a plurality of barcoded mRNA polynucleotides, for examplefrom a plurality of partitions, is pooled and subjected to furtherprocessing in bulk. In some embodiments, the barcoded templatepolynucleotide is subjected to polymerase chain reaction. In someembodiments, the template polynucleotide comprises mRNA and the barcodedtemplate polynucleotide is subjected to reverse transcription, yieldinga cDNA transcript. In embodiments where reverse transcription isperformed in a partition, the partitions can comprise primers having apoly-T region capable of hybridizing to the poly-A region of thebarcoded mRNA. Within the partition, the primer having a poly-T regioncan hybridize to the barcoded template and initiate primer extension inreverse transcription. Non-limiting examples of enzymes that can be usedfor reverse transcription in embodiments herein include HIV-1 reversetranscriptase, M-MLV reverse transcriptase, AMV reverse transcriptase,telomerase reverse transcriptase, and variants, modified products andderivatives thereof. A partition can contain a reverse transcriptaseenzyme capable of reverse transcribing a template polynucleotide that isattached at its 5′ end to a barcoded oligonucleotide. In embodimentswhere reverse transcription is performed in bulk, a plurality ofbarcoded mRNA polynucleotides from a plurality of partitions can bepooled for bulk processing. The reaction volume for performing reversetranscription can comprise primers having a poly-T region capable ofhybridizing to the poly-A region of a barcoded mRNA. In some cases, theprimers for reverse transcription further comprise additional elements,such as tags, which can be used, for example, for isolating cDNAtranscripts. For example, cDNA transcripts comprising biotin tags can beisolated from components of the reaction volume (e.g., excess primers,reverse transcriptase enzyme, barcoded mRNA molecules) by performing apurification reaction with streptavidin or other molecule capable ofbinding biotin.

Following the generation of barcoded template polynucleotides orderivatives (e.g., amplification products) thereof, subsequentoperations may be performed, including purification (e.g., via solidphase reversible immobilization (SPRI)) or further processing (e.g.,shearing, addition of functional sequences, and subsequent amplification(e.g., via PCR)). Functional sequences, such as flow cell sequences, maybe added by ligation. These operations may occur in bulk (e.g., outsidethe partition). In the case where a partition is a droplet in anemulsion, the emulsion can be broken and the contents of the dropletpooled for additional operations. Additional reagents that may beco-partitioned along with the barcode bearing bead may includeoligonucleotides to block ribosomal RNA (rRNA) and nucleases to digestgenomic DNA from cells. Alternatively, rRNA removal agents may beapplied during additional processing operations. The configuration ofthe constructs generated by such a method can help minimize (or avoid)sequencing of the poly-T sequence during sequencing and/or sequence the5′ end of a polynucleotide sequence. The amplification products, forexample first amplification products and/or second amplificationproducts, may be subject to sequencing for sequence analysis.

Although operations with various barcode designs have been discussedindividually, individual beads can include barcode oligonucleotides ofvarious designs for simultaneous use.

In addition to characterizing individual cells or cell sub-populationsfrom larger populations, the processes and systems described herein mayalso be used to characterize individual cells as a way to provide anoverall profile of a cellular, or other organismal population. A varietyof applications require the evaluation of the presence andquantification of different cell or organism types within a populationof cells, including, for example, microbiome analysis andcharacterization, environmental testing, food safety testing,epidemiological analysis, e.g., in tracing contamination or the like. Inparticular, the analysis processes described above may be used toindividually characterize, sequence and/or identify large numbers ofindividual cells within a population. This characterization may then beused to assemble an overall profile of the originating population, whichcan provide important prognostic and diagnostic information.

For example, shifts in human microbiomes, including, e.g., gut, buccal,epidermal microbiomes, etc., have been identified as being bothdiagnostic and prognostic of different conditions or general states ofhealth. Using the single cell analysis methods and systems describedherein, one can again, characterize, sequence and identify individualcells in an overall population, and identify shifts within thatpopulation that may be indicative of diagnostic ally relevant factors.By way of example, sequencing of bacterial 16S ribosomal RNA genes hasbeen used as a highly accurate method for taxonomic classification ofbacteria. Using the targeted amplification and sequencing processesdescribed above can provide identification of individual cells within apopulation of cells. One may further quantify the numbers of differentcells within a population to identify current states or shifts in statesover time. See, e.g., Morgan et al, PLoS Comput. Biol., Ch. 12, December2012, 8(12):e1002808, and Ram et al., Syst. Biol. Reprod. Med., June2011, 57(3):162-170, each of which is incorporated herein by referencein its entirety for all purposes. Likewise, identification and diagnosisof infection or potential infection may also benefit from the singlecell analyses described herein, e.g., to identify microbial speciespresent in large mixes of other cells or other biological material,cells and/or nucleic acids, including the environments described above,as well as any other diagnostically relevant environments, e.g.,cerebrospinal fluid, blood, fecal or intestinal samples, or the like.

The foregoing analyses may also be particularly useful in thecharacterization of potential drug resistance of different cells orpathogens, e.g., cancer cells, bacterial pathogens, etc., through theanalysis of distribution and profiling of different resistancemarkers/mutations across cell populations in a given sample.Additionally, characterization of shifts in these markers/mutationsacross populations of cells over time can provide valuable insight intothe progression, alteration, prevention, and treatment of a variety ofdiseases characterized by such drug resistance issues.

Although described in terms of cells, it will be appreciated that any ofa variety of individual biological organisms, or components of organismsare encompassed within this description, including, for example, cells,viruses, organelles, cellular inclusions, vesicles, or the like.Additionally, where referring to cells, it will be appreciated that suchreference includes any type of cell, including without limitationprokaryotic cells, eukaryotic cells, bacterial, fungal, plant,mammalian, or other animal cell types, mycoplasmas, normal tissue cells,tumor cells, or any other cell type, whether derived from single cell ormulticellular organisms.

Similarly, analysis of different environmental samples to profile themicrobial organisms, viruses, or other biological contaminants that arepresent within such samples, can provide important information aboutdisease epidemiology, and potentially aid in forecasting diseaseoutbreaks, epidemics an pandemics.

As described above, the methods, systems and compositions describedherein may also be used for analysis and characterization of otheraspects of individual cells or populations of cells. In one exampleprocess, a sample is provided that contains cells that are to beanalyzed and characterized as to their cell surface proteins. Alsoprovided is a library of antibodies, antibody fragments, or othermolecules having a binding affinity to the cell surface proteins orantigens (or other cell features) for which the cell is to becharacterized (also referred to herein as cell surface feature bindinggroups). For ease of discussion, these affinity groups are referred toherein as binding groups. The binding groups can include a reportermolecule that is indicative of the cell surface feature to which thebinding group binds. In particular, a binding group type that isspecific to one type of cell surface feature will comprise a firstreporter molecule, while a binding group type that is specific to adifferent cell surface feature will have a different reporter moleculeassociated with it. In some aspects, these reporter molecules willcomprise oligonucleotide sequences. Oligonucleotide based reportermolecules provide advantages of being able to generate significantdiversity in terms of sequence, while also being readily attachable tomost biomolecules, e.g., antibodies, etc., as well as being readilydetected, e.g., using sequencing or array technologies. In the exampleprocess, the binding groups include oligonucleotides attached to them.Thus, a first binding group type, e.g., antibodies to a first type ofcell surface feature, will have associated with it a reporteroligonucleotide that has a first nucleotide sequence. Different bindinggroup types, e.g., antibodies having binding affinity for other,different cell surface features, will have associated therewith reporteroligonucleotides that comprise different nucleotide sequences, e.g.,having a partially or completely different nucleotide sequence. In somecases, for each type of cell surface feature binding group, e.g.,antibody or antibody fragment, the reporter oligonucleotide sequence maybe known and readily identifiable as being associated with the knowncell surface feature binding group. These oligonucleotides may bedirectly coupled to the binding group, or they may be attached to abead, molecular lattice, e.g., a linear, globular, cross-linked, orother polymer, or other framework that is attached or otherwiseassociated with the binding group, which allows attachment of multiplereporter oligonucleotides to a single binding group.

In the case of multiple reporter molecules coupled to a single bindinggroup, such reporter molecules can comprise the same sequence, or aparticular binding group will include a known set of reporteroligonucleotide sequences. As between different binding groups, e.g.,specific for different cell surface features, the reporter molecules canbe different and attributable to the particular binding group.

Attachment of the reporter groups to the binding groups may be achievedthrough any of a variety of direct or indirect, covalent or non-covalentassociations or attachments. For example, in the case of oligonucleotidereporter groups associated with antibody based binding groups, sucholigonucleotides may be covalently attached to a portion of an antibodyor antibody fragment using chemical conjugation techniques (e.g.,Lightning-Link® antibody labeling kits available from InnovaBiosciences), as well as other non-covalent attachment mechanisms, e.g.,using biotinylated antibodies and oligonucleotides (or beads thatinclude one or more biotinylated linker, coupled to oligonucleotides)with an avidin or streptavidin linker. Antibody and oligonucleotidebiotinylation techniques are available (See, e.g., Fang, et al.,Fluoride-Cleavable Biotinylation Phosphoramidite for 5′-end-Labeling andAffinity Purification of Synthetic Oligonucleotides, Nucleic Acids Res.Jan. 15, 2003; 31(2):708-715, DNA 3′ End Biotinylation Kit, availablefrom Thermo Scientific, the full disclosures of which are incorporatedherein by reference in their entirety for all purposes). Likewise,protein and peptide biotinylation techniques have been developed and arereadily available (See, e.g., U.S. Pat. No. 6,265,552, the fulldisclosures of which are incorporated herein by reference in theirentirety for all purposes).

The reporter oligonucleotides may be provided having any of a range ofdifferent lengths, depending upon the diversity of reporter moleculesdesired or a given analysis, the sequence detection scheme employed, andthe like. In some cases, these reporter sequences can be greater thanabout 5 nucleotides in length, greater than about 10 nucleotides inlength, greater than about 20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 150or even 200 nucleotides in length. In some cases, these reporternucleotides may be less than about 250 nucleotides in length, less thanabout 200, 180, 150, 120 100, 90, 80, 70, 60, 50, 40, or even 30nucleotides in length. In many cases, the reporter oligonucleotides maybe selected to provide barcoded products that are already sized, andotherwise configured to be analyzed on a sequencing system. For example,these sequences may be provided at a length that ideally createssequenceable products of a desired length for particular sequencingsystems. Likewise, these reporter oligonucleotides may includeadditional sequence elements, in addition to the reporter sequence, suchas sequencer attachment sequences, sequencing primer sequences,amplification primer sequences, or the complements to any of these.

In operation, a cell-containing sample is incubated with the bindingmolecules and their associated reporter oligonucleotides, for any of thecell surface features desired to be analyzed. Following incubation, thecells are washed to remove unbound binding groups. Following washing,the cells are partitioned into separate partitions, e.g., droplets,along with the barcode carrying beads described above, where eachpartition includes a limited number of cells, e.g., in some cases, asingle cell. Upon releasing the barcodes from the beads, they will primethe amplification and barcoding of the reporter oligonucleotides. Asnoted above, the barcoded replicates of the reporter molecules mayadditionally include functional sequences, such as primer sequences,attachment sequences or the like.

The barcoded reporter oligonucleotides are then subjected to sequenceanalysis to identify which reporter oligonucleotides bound to the cellswithin the partitions. Further, by also sequencing the associatedbarcode sequence, one can identify that a given cell surface featurelikely came from the same cell as other, different cell surfacefeatures, whose reporter sequences include the same barcode sequence,i.e., they were derived from the same partition.

Based upon the reporter molecules that emanate from an individualpartition based upon the presence of the barcode sequence, one may thencreate a cell surface profile of individual cells from a population ofcells. Profiles of individual cells or populations of cells may becompared to profiles from other cells, e.g., ‘normal’ cells, to identifyvariations in cell surface features, which may provide diagnosticallyrelevant information. In particular, these profiles may be particularlyuseful in the diagnosis of a variety of disorders that are characterizedby variations in cell surface receptors, such as cancer and otherdisorders.

The present disclosure also provides methods for reducing nonspecificpriming in a single-cell 5′ gene expression assay. In generating anassay that allows measurement of 1) a cell barcode sequence (barcode),2) a unique molecular identifier sequence (UMI) and 3) the 5′ sequenceof an mRNA transcript simultaneously, one strategy is to place thesesequences on a sequence that attaches to the 5′ end of an mRNAtranscript—in the present disclosure, this may be accomplished byplacing the barcode and UMI on a template switching oligonucleotide(TSO). This oligonucleotide may be attached to the first strand cDNA viaa template switching reaction where the reverse transcription (RT)enzyme 1) reverse transcribes a messenger RNA (mRNA) sequence intofirst-strand complementary DNA (cDNA) from a primer targeting the 3′ endof the mRNA, 2) adds nontemplated cytidines to the 5′ end of thefirst-strand cDNA, 3) switches template to the TSO, which may contain 3′guanidines or guanidine-derivatives that hybridize to the addedcytidines. The result is a first-strand cDNA molecule that iscomplementary to the TSO sequence: cell-barcode, UMI, guanidines, andthe 5′ end of the mRNA.

In some cases, the TSO may co-exist in solution with the RT enzyme andthe total RNA contents of a cell. If the TSO is a single stranded DNA(ssDNA) molecule, it can participate as an RT primer rather than as atemplate-switching substrate. Given, for example, that the over 90% ofthe total RNA contents of a cell include noncoding ribosomal RNA (rRNA),this may produce barcoded off products that do not contribute to the 5′gene expression or V(D)J sequencing assay but do consume sequencingreads, increasing the cost required to achieve the same sequencingdepth. In addition, if the UMI is implemented as a randomer, thepresence of this randomer at the 3′ end of the TSO greatly increases itsability to serve as a primer on rRNA template.

In some cases, a TSO that is less likely to serve as an RT primer viathe introduction of a particular spacer sequence between the UMI andterminal riboGs may be used. Another approach is to design and include aset of auxiliary blocking oligonucleotides that may hybridize to rRNAand prevent binding of the TSO.

The spacer sequence can be optimized by selecting a sequence thatminimizes the predicted melting temperature of the (spacer-GGG):rRNAduplex against all human ribosomal RNA molecules.

The blocker sequences can be optimized by selecting sequences thatmaximize the predicted melting temperature of the (blocker):rRNA duplexagainst all human ribosomal RNA molecules.

Provided herein are TSO that are less likely to serve as an RT primervia the introduction of a particular spacer sequence between the UMI andterminal riboGs. Additionally, described herein are auxiliary blockingoligonucleotides that hybridize to rRNA and prevent binding of the TSO.

Table 1 provides examples of spacer sequences that are optimized byselecting a sequence that minimizes the predicted melting temperature ofthe (spacer-GGG):rRNA duplex against all human ribosomal RNA molecules.

TABLE 1 Spacer sequences SEQ ID NO GG_S1_6 TTATATGGG GG_S1_10TTTCTTATATGGG 1 GG_S1_20 AAATCAAATCTTTCTTATATGGG 2 GG_S1_30ACAAACAAATAAATCAAATCTTTCTTATATGGG 3 GG_S2_6 TTTAAAGGG GG_S2_10GAAATTTAAAGGG 4 GG_S2_20 CACTCTACATGAAATTTAAAGGG 5 GG_S2_30CCAAAGTTGTCACTCTACATGAAATTTAAAGGG 6 GL6_S3_6 ATATAAGGG GL6_S3_10ATATATATAAGGG 7 GL6_S3_20 ATATATATATATATATATAAGGG 8 GL6_S3_30ATATATATATATATATATATATATATATAAGGG 9

Table 2 provides examples of blocker sequences that are optimized byselecting sequences that maximize the predicted melting temperature ofthe (blocker):rRNA duplex against all human ribosomal RNA molecules.

TABLE 2 Blocker sequences SEQ ID NO 28S_30_3130GCCGGCCGCCCCGGCGGCCGCCGCGCGGCC 10 18S_30_254GCCGCCGGCGCCCGCCCCCCGGCCGGGGCC 11 28S_30_2088GCGCGCGCGCGCGCCGCCCCCGCCGCTCCC 12 28S_30_3284GGGGCGCGCCGCGCCGCCGCCGGGCTCCCC 13 28S_30_834GCCGCCGCCACCGCCGCCGCCGCCGCCGCC 14 28S_30_3373GCCCCGCCCCGCCGCCCGCCGACCGCCGCC 15 28S_30_3473GCGGCCCCTCCGCCGCCTGCCGCCGCCGCC 16 28S_30_4105GGAGCGGGTCGCGCCCGGCCGGGCGGGCGC 17 28S_30_1129GCCCCGCCCCCCGACCCGCGCGCGGCACCC 18 28S_30_3989GGCGGCCCGCAGGGCCGCGGACCCCGCCCC 19 28S_30_4781GGCGGGGCACGCGCCCTCCCGCGGCGGGGC 20 18S_30_1750GCCAGGGCCGTGGGCCGACCCCGGCGGGGC 21 28S_30_611GTCCCCCGCCGACCCCACCCCCGGCCCCGC 22 18S_30_693GGCTCGCCTCGCGGCGGACCGCCCGCCCGC 23 28S_30_232GACCCGGGCGCGCGCCGGCCGCTACCGGCC 24 28S_30_2919GCGCGCCTCGTCCAGCCGCGGCGCGCGCCC 25 28S_30_1050GCGCCGTGGGAGGGGTGGCCCGGCCCCCCC 26 28S_30_725GGGCCCCCCGAGCCACCTTCCCCGCCGGGC 27 28S_30_2295GGCGGCTCCACCCGGGCCCGCGCCCTAGGC 28 28S_30_3004GGCGCGGGGTGGGGAGGGAGCGAGCGGCGC 29 28S_30_3547GCTAGGCGCCGGCCGAGGCGAGGCGCGCGC 30 28S_30_115GTCCCGCGCCCCGCGGGGCGGGGATTCGGC 31 28S_30_4858GGGGCGGCCGCCTTTCCGGCCGCGCCCCGT 32 28S_30_1451ACCTCCCCGGCGCGGCGGGCGAGACGGGCC 33 28S_30_472GATCCGCCGGGCCGCCGACACGGCCGGACC 34 28S_30_1246GCCGACCCCGTGCGCTCGCTCCGCCGTCCC 35 28S_30_936GCGCGGCGACGGGTCTCGCTCCCTCGGCCC 36 28S_30_3207GCCCGGCTCGCGTCCAGAGTCCGCGCCGCC 37 28S_30_2578TCCCCGGGGCTCCCGCCGGCTTCTCCGGGA 38 28S_30_1380ACCTCGGCCGGCGAGCGCGCCGGCCTTCAC 39 28S_30_1791ACGCCCGGCTCCACGCCAGCGAGCCGGGCT 40 28S_30_2684GCTCACCGGACGCCGCCGGAACCGCGACGC 41 28S_30_2441TCGCCCGTCCCTTCGGAACGGCGCTCGCCC 42 28S_30_1671GGGGTGCGTCGGGTCTGCGAGAGCGCCAGC 43 28S_30_4696GGCCAACCGAGGCTCCGCGGCGCTGCCGTA 44 18S_30_1551GTTACCCGCGCCTGCCGGCGTAGGGTAGGC 45 28S_30_3634GCGTCAACACCCGCCGCGGGCCTTCGCGAT 46 18S_30_827AGCTGCGGTATCCAGGCGGCTCGGGCCTGC 47 28S_30_1883GCGTCGGCATCGGGCGCCTTAACCCGGCGT 48 18S_30_1088GGGAATAACGCCGCCGCATCGCCGGTCGGC 49 18S_30_923GCGGCGCAATACGAATGCCCCCGGCCGTCC 50 28S_30_2755TGCTGCGGATATGGGTACGGCCCGGCGCGA 51 18S_30_328GGGCAGACGTTCGAATGGGTCGTCGCCGCC 52 18S_30_1207GCCGCAGGCTCCACTCCTGGTGGTGCCCTT 53 18S_30_597ACCGCGGCTGCTGGCACCAGACTTGCCCTC 54 18S_30_473GGGTCGGGAGTGGGTAATTTGCGCGCCTGC 55 28S_30_1AGCGGGTCGCCACGTCTGATCTGAGGTCGC 56 28S_30_3851TTCCCCGCTGATTCCGCCAAGCCCGTTCCC 57 28S_30_1556TGCACGTCAGGACCGCTACGGACCTCCACC 58 28S_30_1954AGCGGATTCCGACTTCCATGGCCACCGTCC 59 28S_30_4608AGCTTCGCCCCATTGGCTCCTCAGCCAAGC 60 28S_30_4971GATCGCAGCGAGGGAGCTGCTCTGCTACGT 61 18S_30_1311GAACGGCCATGCACCACCACCCACGGAATC 62 18S_30_1446TCTCGGGTGGCTGAACGCCACTTGTCCCTC 63 18S_30_164GGGTCAGCGCCCGTCGGCATGTATTAGCTC 64 28S_30_4191TCCTCCCTGAGCTCGCCTTAGGACACCTGC 65 18S_30_400GGAATCGAACCCTGATTCCCCGTCACCCGT 66 18S_30_1679ACGGGCGGTGTGTACAAAGGGCAGGGACTT 67 28S_30_2827TACGGATCCGGCTTGCCGACTTCCCTTACC 68 18S_30_46ACCGGCCGTGCGTACTTAGACATGCATGGC 69 28S_30_3716GTCATAGTTACTCCCGCCGTTTACCCGCGC 70 18S_30_1837GATCCTTCCGCAGGTTCACCTACGGAAACC 71 28S_30_4484TCACGACGGTCTAAACCCAGCTCACGTTCC 72 28S_30_4273GGCCCCGCTTTCACGGTCTGTATTCGTACT 73 28S_30_328GTACTTGTTGACTATCGGTCTCGTGCCGGT 74 28S_30_2368GGAACCCTTCTCCACTTCGGCCTTCAAAGT 75 28S_30_401ACCCGTTTACCTCTTAACGGTTTCACGCCC 76 18S_30_1017GAACCTCCGACTTTCGTTCTTGATTAATGA 77 28S_50_3116GCCCCCGCCGGCCGCCCCGGCGGCCGCCGCGCG 78 GCCCCTGCCGCCCCGAC 28S_50_823GCCCCCGCCGCCGCCGCCACCGCCGCCGCCGCC 79 GCCGCCCCGACCCGCGC 28S_50_3463GGACCGGCCCGCGGCCCCTCCGCCGCCTGCCGC 80 CGCCGCCGCCGCGCGCC 28S_50_3353GCCCCGCCCCGCCGCCCGCCGACCGCCGCCGCC 81 CGACCGCTCCCGCCCCC 28S_50_1113GTCCGCCCCGCCCCCCGACCCGCGCGCGGCACC 82 CCCCCCGTCGCCGGGGC 28S_50_4779ACCCCGGTCCCGGCGCGCGGCGGGGCACGCGCC 83 CTCCCGCGGCGGGGCGC 28S_50_569GGCCCCGCCCGCCCACCCCCGCACCCGCCGGAG 84 CCCGCCCCCTCCGGGGA 28S_50_3969GGCGGCCCGCAGGGCCGCGGACCCCGCCCCGGG 85 CCCCTCGCGGGGACACC 28S_50_2094GCCGCCCTCCGACGCACACCACACGCGCGCGCG 86 CGCGCGCCGCCCCCGCC 28S_50_2904GGGGCGCGCGCCTCGTCCAGCCGCGGCGCGCGC 87 CCAGCCCCGCTTCGCGC 18S_50_235AGCCGCCGGCGCCCGCCCCCCGGCCGGGGCCGG 88 AGAGGGGCTGACCGGGT 18S_50_690GGGCGGGGACGGGCGGTGGCTCGCCTCGCGGCG 89 GACCGCCCGCCCGCTCC 28S_50_3189GGGCCCGGCTCGCGTCCAGAGTCCGCGCCGCCG 90 CCGGCCCCCCGGGTCCC 28S_50_4097GGCACTGTCCCCGGAGCGGGTCGCGCCCGGCCG 91 GGCGGGCGCTTGGCGCC 28S_50_1030GCGCCGTGGGAGGGGTGGCCCGGCCCCCCCACG 92 AGGAGACGCCGGCGCGC 28S_50_1434TCCACCTCCCCGGCGCGGCGGGCGAGACGGGCC 93 GGTGGTGCGCCCTCGGC 28S_50_687TCCCCGCCGGGCCTTCCCAGCCGTCCCGGAGCC 94 GGTCGCGGCGCACCGCC 28S_50_3534GTCGGCTGCTAGGCGCCGGCCGAGGCGAGGCGC 95 GCGCGGAACCGCGGCCC 28S_50_207GGGCGCGCGCCGGCCGCTACCGGCCTCACACCG 96 TCCACGGGCTGGGCCTC 28S_50_1187GGGACGCGCGCGTGGCCCCGAGAGAACCTCCCC 97 CGGGCCCGACGGCGCGA 28S_50_461GGCGGGAAAGATCCGCCGGGCCGCCGACACGG 98 CCGGACCCGCCGCCGGGT 18S_50_1750GACCGTCTTCTCAGCGCTCCGCCAGGGCCGTGG 99 GCCGACCCCGGCGGGGC 28S_50_131AGCGGCGCCGGGGAGCGGGTCTTCCGTACGCCA 100 CATGTCCCGCGCCCCGC 28S_50_2240GGCTCACCGCAGCGGCCCTCCTACTCGTCGCGG 101 CGTAGCGTCCGCGGGGC 28S_50_916GCGCGGCGACGGGTCTCGCTCCCTCGGCCCCGG 102 GATTCGGCGAGTGCTGC 28S_50_1360ACCTCGGCCGGCGAGCGCGCCGGCCTTCACCTT 103 CATTGCGCCACGGCGGC 28S_50_2442GGCCGAGGGCAACGGAGGCCATCGCCCGTCCCT 104 TCGGAACGGCGCTCGCC 28S_50_4678GAGGCCAACCGAGGCTCCGCGGCGCTGCCGTAT 105 CGTTCGCCTGGGCGGGA 28S_50_2665AGCTCACCGGACGCCGCCGGAACCGCGACGCTT 106 TCCAAGGCACGGGCCCC 28S_50_1259TCAAGACGGGTCGGGTGGGTAGCCGACGTCGCC 107 GCCGACCCCGTGCGCTC 28S_50_2560TCTCCCCGGGGCTCCCGCCGGCTTCTCCGGGAT 108 GGTCGCGTTACCGCAC 18S_50_1085GCTGCCCGGCGGGTCATGGGAATAACGCCGCCG 109 CATCGCCGGTCGGCATC 28S_50_1796GTGGCCCACTAGGCACTCGCATTCCACGCCCGG 110 CTCCACGCCAGCGAGCC 28S_50_2020GCGACGGCCGGGTATGGGCCCGACGCTCCAGCG 111 CCATCCATTTTCAGGGC 18S_50_1509GGGTAGGCACACGCTGAGCCAGTCAGTGTAGCG 112 CGCGTGCAGCCCCGGAC 28S_50_1875GGGGTCTGATGAGCGTCGGCATCGGGCGCCTTA 113 ACCCGGCGTTCGGTTCA 28S_50_3635GCACTGGGCAGAAATCACATCGCGTCAACACCC 114 GCCGCGGGCCTTCGCGA 28S_50_2757GAGGCTGTTCACCTTGGAGACCTGCTGCGGATA 115 TGGGTACGGCCCGGCGC 18S_50_469TCGTCACTACCTCCCCGGGTCGGGAGTGGGTAA 116 TTTGCGCGCCTGCTGCC 28S_50_4889GAATGGTTTAGCGCCAGGTTCCCCACGAACGTG 117 CGGTGCGTGACGGGCGA 28S_50_1646GCGTCGGGTCTGCGAGAGCGCCAGCTATCCTGA 118 GGGAAACTTCGGAGGGA 28S50 1557ACCCAGGTCGGACGACCGATTTGCACGTCAGGA 119 CCGCTACGGACCTCCAC 18S_50_376TCGAACCCTGATTCCCCGTCACCCGTGGTCACCA 120 TGGTAGGCACGGCGAC 28S_50_3831TTCCCCGCTGATTCCGCCAAGCCCGTTCCCTTGG 121 CTGTGGTTTCGCTGGA 18S_50_903GCGGCGCAATACGAATGCCCCCGGCCGTCCCTC 122 TTAATCATGGCCTCAGT 18S_50_1223GGGCCGGGTGAGGTTTCCCGTGTTGAGTCAAAT 123 TAAGCCGCAGGCTCCAC 18S_50_827GGTCCTATTCCATTATTCCTAGCTGCGGTATCCA 124 GGCGGCTCGGGCCTGC 18S_50_595GCTATTGGAGCTGGAATTACCGCGGCTGCTGGC 125 ACCAGACTTGCCCTCCA 28S_50_4172GTCCTCCCTGAGCTCGCCTTAGGACACCTGCGTT 126 ACCGTTTGACAGGTGT 18S_50_1307TCGCTCCACCAACTAAGAACGGCCATGCACCAC 127 CACCCACGGAATCGAGA 28S_50_4967GGGCTGACTTTCAATAGATCGCAGCGAGGGAGC 128 TGCTCTGCTACGTACGA 28S_50_1948GTTACACACTCCTTAGCGGATTCCGACTTCCATG 129 GCCACCGTCCTGCTGT 28S_50_4602ATCCCACAGATGGTAGCTTCGCCCCATTGGCTCC 130 TCAGCCAAGCACATAC 18S_50_46GCCATTCGCAGTTTCACTGTACCGGCCGTGCGTA 131 CTTAGACATGCATGGC 18S_50_1669GGTAGTAGCGACGGGCGGTGTGTACAAAGGGCA 132 GGGACTTAATCAACGCA 28S_50_390GCGGACCCCACCCGTTTACCTCTTAACGGTTTCA 133 CGCCCTCTTGAACTCT 28S_50_2312TTGAATATTTGCTACTACCACCAAGATCTGCACC 134 TGCGGCGGCTCCACCC 28S_50_2832TTAGAGCCAATCCTTATCCCGAAGTTACGGATCC 135 GGCTTGCCGACTTCCC 28S_50_288TCGTGCCGGTATTTAGCCTTAGATGGAGTTTACC 136 ACCCGCTTTGGGCTGC 28S_50_3718GGCATTTGGCTACCTTAAGAGAGTCATAGTTACT 137 CCCGCCGTTTACCCGC 28S_50_4472AACCTGTCTCACGACGGTCTAAACCCAGCTCAC 138 GTTCCCTATTAGTGGGT 18S_50_144GGGTCAGCGCCCGTCGGCATGTATTAGCTCTAG 139 AATTACCACAGTTATCC 28S_50_4252GCCCCGCTTTCACGGTCTGTATTCGTACTGAAAA 140 TCAAGATCAAGCGAGC 28S_50_9TCCTCCGCTGACTAATATGCTTAAATTCAGCGGG 141 TCGCCACGTCTGATCT 18S_50_1438GTTATTGCTCAATCTCGGGTGGCTGAACGCCACT 142 TGTCCCTCTAAGAAGT 28S_50_1723TGAGAATAGGTTGAGATCGTTTCGGCCCCAAGA 143 CCTCTAATCATTCGCTT 18S_50_1003GTCTTCGAACCTCCGACTTTCGTTCTTGATTAAT 144 GAAAACATTCTTGGCA

Table 3 provides examples of full construct barcodes.

TABLE 3 Full construct barcodes SEQ ID NO P7 no UMICAAGCAGAAGACGGCATACGAGATXXXXXXGTXXXX 145XXGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTrG rGrG P7 early UMICAAGCAGAAGACGGCATACGAGATNNNNNNNNNNXX 146XXXXGTXXXXXXGTGACTGGAGTTCAGACGTGTGCTC TTCCGATCTrGrGrG P5 no UMIAATGATACGGCGACCACCGAGATCTACACXXXXXXG 147TXXXXXXACACTCTTTCCCTACACGACGCTCTTCCGAT CTrGrGrG P5 early UMIAATGATACGGCGACCACCGAGATCTACACNTNNNNNNN 148NNNXXXXXXGTXXXXXXACACTCTTTCCCTACACGAC GCTCTTCCGATCTrGrGrGR1 inline no UMI CTACACGACGCTCTTCCGATCTXXXXXXGTXXXXXXr 149 GrGrGR1 inline late UMI CTACACGACGCTCTTCCGATCTXXXXXXGTXXXXXXN 150NNNNNNNNNrGrGrG R1 late UMI AT rich CTACACGACGCTCTTCCGATCTXXXXXXGTXXXXXX151 (N1: 25252525)N1N1N1N1(N2: 40101040)N2N2WWrGrGrG R1 inline early UMICTACACGACGCTCTTCCGATCTNNNNNNNNNNXXXXX 152 XGTXXXXXXrGrGrGR1 inline late UMI CTACACGACGCTCTTCCGATCTXXXXXXGTXXXXXXN 153 Spacer 2NNNNNNNNNATrGrGrG R1 inline late UMICTACACGACGCTCTTCCGATCTXXXXXXGTXXXXXXN 154 Spacer 4 NNNNNNNNNACATrGrGrGR1 inline late UMI CTACACGACGCTCTTCCGATCTXXXXXXGTXXXXXXN 155 Spacer 6NNNNNNNNNTAACATrGrGrG R1 inline late UMICTACACGACGCTCTTCCGATCTXXXXXXGTXXXXXXN 156 Spacer 8NNNNNNNNNCGTAACATrGrGrG R1 inline late UMICTACACGACGCTCTTCCGATCTXXXXXXGTXXXXXXN 157 Spacer 10NNNNNNNNNACCGTAACATrGrGrG R1 inline late UMI FullCTACACGACGCTCTTCCGATCTXXXXXXGTXXXXXXN 158 SpacerNNNNNNNNNACACAAGAGGCACGCGTAACATrGrGrG X represents nucleotides that makeup the barcode sequence. All oligos on one bead can have the samebarcode sequence, oligos on different beads may have different barcodesequences. N and W represent any of {A,C,G,T} and any of {A,T}respectively that make up the UMI sequence. UMIs may be different acrossdifferent oligos on the same bead. N1 is any one of {A,C,G,T}; N1positions with ratios of 25%, 25%, 25% and 25% for the four nucleotides.N2 is any one of {A,C,G,T}; N2 positions with ratios of 40%, 10%, 10%and 40% for the four nucleotides.

In some examples, a cell barcode may be a 16 base sequence that is arandom choice from about 737,000 sequences. The length of the barcode(16) can be altered. The diversity of potential barcode sequences (737k) can be alterable. The defined nature of the barcode can be altered,for example, it may also be completely random (16 Ns) or semi-random (16bases that come from a biased distribution of nucleotides).

The canonical UMI sequence may be a 10 nucleotide randomer. The lengthof the UMI can be altered. The random nature of the UMI can be altered,for example, it may be semi-random (bases that come from a biaseddistribution of nucleotides.) In a certain case, the distribution of UMInucleotide(s) may be biased; for example, UMI sequences that do notcontain Gs or Cs may be less likely to serve as primers.

The spacer may alterable within given or predetermined parameters. Forexample one method may give an optimal sequence of TTTCTTATAT (SEQ IDNO: 159), but using a slightly different optimization strategy resultsin a sequence that is likely just as or nearly as good.

The selected template switching region can comprise 3 consecutive riboGsor more. The selected template switching region can comprise 4, 5, 6, 7,8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 consecutive riboGs ormore. Alternative nucleotide may be used such as deoxyribo Gs, LNA G's,and potentially any combination thereof.

The present disclosure also provides methods of enriching cDNAsequences. Enrichment may be useful for TCR, BCR, and immunoglobulingene analysis since these genes may possess similar yet polymorphicvariable region sequences. These sequences can be responsible forantigen binding and peptide-MHC interactions. For example, due to generecombination events in individual developing T cells, a single human ormouse will naturally express many thousands of different TCR genes. ThisT cell repertoire can exceed 100,000 or more different TCRrearrangements occurring during T cell development, yielding a total Tcell population that is highly polymorphic with respect to its TCR genesequences especially for the variable region. For immunoglobulin genes,the same may apply, except even greater diversity may be present. Aspreviously noted, each distinct sequence may correspond to a clonotype.In certain embodiments, enrichment increases accuracy and sensitivity ofmethods for sequencing TCR, BCR and immunoglobulin genes at a singlecell level. In certain embodiments, enrichment increases the number ofsequencing reads that map to a TCR, BCR, or immunoglobulin gene. In someembodiments, enrichment leads to greater than or equal to 25%, 30%, 35%,40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95% or more oftotal sequencing reads mapping to a TCR, BCR or immunoglobulin gene. Insome embodiments, enrichment leads to greater than or equal to 25%, 30%,35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95% or moreof total sequencing reads mapping to a variable region of a TCR, BCR orimmunoglobulin gene.

In order to aide in sequencing, detection, and analysis of sequences ofinterest, an enrichment step can be employed. Enrichment may be usefulfor the sequencing and analysis of genes that may be related yet highlypolymorphic. In some embodiments, an enriched gene comprises a TCRsequence, a BCR sequence, or an immunoglobulin sequence. In someembodiments, an enriched gene comprises a mitochondrial gene or acytochrome family gene. In some embodiments, enrichment is employedafter an initial round of reverse transcription (e.g., cDNA production).In some embodiments, enrichment is employed after an initial round ofreverse transcription and cDNA amplification for at least 5, 10, 15, 20,25, 30, 40 or more cycles. In some embodiments, enrichment is employedafter a cDNA amplification. In some embodiments, the amplified cDNA canbe subjected to a clean up step before the enrichment step using acolumn, gel extraction, or beads in order to remove unincorporatedprimers, unincorporated nucleotides, very short or very long nucleicacid fragments and enzymes. In some embodiments, enrichment is followedby a clean-up step before sequencing library preparation.

Enrichment of gene or cDNA sequences can be facilitated by a primer thatanneals within a known sequence of the target gene. In some embodiments,for enrichment of a TCR, BCR, or immunoglobulin gene, a primer thatanneals to a constant region of the gene or cDNA can be paired with asequencing primer that anneals to a TSO functional sequence. In someembodiments, the enriched cDNA falls into a length range thatapproximately corresponds to that genes variable region. In someembodiments, greater than about 50%, 60%, 70%, 80%, 85%, 90%, 95% ormore cDNA or cDNA fragments fall within a range of about 300 base pairsto about 900 base pairs, of about 400 base pairs to about 800 basepairs, of about 500 base pairs to about 700 base pairs, or of about 500base pairs to about 600 base pairs.

FIG. 20 shows an example enrichment scheme. In operation 2001, anoligonucleotide with a poly-T sequence 2014, and in some cases anadditional sequence 2016 that binds to, for example, a sequencing or PCRprimer, anneals to a target RNA 2020. In operation 2002 theoligonucleotide is extended yielding an anti-sense strand 2022 which isappended by multiple cytidines on the 3′ end. A template switchingoligonucleotide attached to a gel bead 2038 is provided and a riboG ofthe TSO pairs with the cytidines of the sense strand and is extended tocreate a sense and an antisense strand. In some cases, the templateswitching oligonucleotide is released from the gel bead duringextension. In some cases, the template switching oligonucleotide isreleased from the gel bead prior to extension. In some cases, thetemplate switching oligonucleotide is released from the gel bead afterextension. In addition to the riboG sequence, the TSO comprises abarcode 2012 and one or two additional functional sequences 2008 and2012. The additional functional sequences can comprise a P7 or R2sequence for attachment to an Illumina sequencing flow cell, forexample. Operations 2001 and 2002 may be performed in a partition (e.g.,droplet or well). Subsequent to operation 2002, the nucleic acid productfrom operations 2001 and 2002 may be removed from the partition and insome cases pooled with other products from other partitions forsubsequent processing.

Next, additional functional sequences can be added that allow foramplification or sample identification. This may occur in a partition orin bulk. This reaction yields amplified cDNA molecules as in 2003 whichare mixed templates comprising a barcode and sequencing primers. In somecases, not all of these cDNA molecules will comprise a target variableregion sequence. In one enrichment scheme, shown in operation 2004, aprimer 2018 that anneals to a sequence 3′ of a TCR, BCR orimmunoglobulin variable region 2020 specifically amplifies the variableregion comprising cDNAs yielding products as shown in operation 2005.Such enrichment may be performed for various approaches describedherein, such as, e.g., the approaches described above in the context ofFIGS. 19A and 19B.

In certain aspects, primer 2018 anneals in a constant region of a TCR(e.g., TCR-alpha or TCR-beta), BCR or immunoglobulin gene. Afteramplification the products are sheared, adaptors ligated and amplified asecond time to add additional functional sequences 2007 and 2011 and asample index 2009 as shown in operation 2006. The additional functionalsequences can functionally complement the first pair 2008 and 2010 andcomprise for example a P5 or R1 sequence. FIG. 21 shows example sizedistributions after cDNA amplification but before enrichment (A), afterenrichment but before sequencing library prep (B), and after sequencinglibrary preparation (C). In some embodiments, the initial poly-T primer,comprising sequences 2016 and 2014 can be attached to a gel bead asopposed to the TSO. In some embodiments, the poly-T comprising primercomprises functional sequences and barcode sequences 2008, 2010, 2012,and the TSO comprises sequence 2016. Operations 2003-2006 may beperformed in bulk.

In some embodiments, clonotype information derived from next-generationsequencing data of cDNA prepped from cellular RNA is combined with othertargeted on non targeted cDNA enrichment to illuminate functional andontological aspects of B-cell and T cells that express a given TCR, BCR,or immunoglobulin. In some embodiments, clonotype information iscombined with analysis of expression of an immunologically relevantcDNA. In some embodiments, the cDNA encodes a cell lineage marker, acell surface functional marker, immunoglobulin isotype, a cytokineand/or chemokine, an intracellular signaling polypeptide, a cellmetabolism polypeptide, a cell-cycle polypeptide, an apoptosispolypeptide, a transcriptional activator/inhibitor, an miRNA or lncRNA.

Also disclosed herein are methods and systems for reference-freeclonotype identification. Such methods may be implemented by way ofsoftware executing algorithms. Tools for assembling T-cell Receptor(TCR) sequences may use known sequences of V and C regions to “anchor”assemblies. This may make such tools only applicable to organisms withwell characterized references (human and mouse). However, most mammalianT cell receptors have similar amino acid motifs and similar structure.In the absence of a reference, a method can scan assembled transcriptsfor regions that are diverse or semi-diverse, find the junction regionwhich should be highly diverse, then scan for known amino acid motifs.In some cases, it may not be critical that the complementary CDRs, suchas the CDR1, CDR2, or CDR3, region be accurately delimited, only that adiverse sequence is found that can uniquely identify the clonotype. Oneadvantage of this method is that the software may not require a set ofreference sequences and can operate fully de novo, thus this method canenable immune research in eukaryotes with poorly characterizedgenomes/transcriptomes.

The methods described herein allow simultaneously obtaining single-cellgene expression information with single-cell immune receptor sequences(TCRs/BCRs). This can be achieved using the methods described herein,such as by amplifying genes relevant to lymphocyte function and state(either in a targeted or unbiased way) while simultaneously amplifyingthe TCR/BCR sequences for clonotyping. This can allow such applicationsas 1) interrogating changes in lymphocyte activation/response to anantigen, at the single clonotype or single cell level; or 2) classifyinglymphocytes into subtypes based on gene expression while simultaneouslysequencing their TCR/BCRs. UMIs are typically ignored during TCR (orgenerally transcriptome) assembly.

Key analytical operations involved in clonotype sequencing according tothe methods described herein include: 1) Assemble each UMI separately,then merge highly similar assembled sequences. High depth per moleculein TCR sequencing makes this feasible. This may result in a reducedchance of “chimeric” assemblies; 2) Assemble all UMIs from each celltogether but use UMI information to choose paths in the assembly graph.This is analogous to using barcode and read-pair information to resolve“bubbles” in WGS assemblies; 3) Base quality estimation. UMI informationand alignment of short reads may be used to assemble contigs to computeper-base quality scores. Base quality scoring may be important as a fewbase differences in a CDR sequence may differentiate one clonotype fromanother. This may be in contrast to other methods that rely on usinglong-read sequencing.

Thus, base quality estimates for assembled contigs can inform clonotypeinference. Errors can make cells with the same (real) clonotype havemismatching assembled sequences. Further, combining base-qualityestimates and clonotype abundances to correct clonotype assignments. Forexample, if 10 cells have clonotype X and one cell has a clonotype thatdiffers by X in only a few bases and these bases have low quality, thenthis cell may be assigned to clonotype X. In some embodiments,clonotypes that differ by a single amino acid or nucleic acid may bediscriminated. In some embodiments, clonotypes that differ by less than50, 40, 30, 20, 15, 10, 9, 8, 7, 6, 5, 4, 3, or 2 amino acids or nucleicacids may be discriminated. An example, non limiting, base errorcalculation scheme is shown below in Example VII.

Also provided herein are the microfluidic devices used for partitioningthe cells as described above. Such microfluidic devices can comprisechannel networks for carrying out the partitioning process like thoseset forth in FIGS. 1 and 2. Briefly, these microfluidic devices cancomprise channel networks, such as those described herein, forpartitioning cells into separate partitions, and co-partitioning suchcells with oligonucleotide barcode library members, e.g., disposed onbeads. These channel networks can be disposed within a solid body, e.g.,a glass, semiconductor or polymer body structure in which the channelsare defined, where those channels communicate at their termini withreservoirs for receiving the various input fluids, and for the ultimatedeposition of the partitioned cells, etc., from the output of thechannel networks. By way of example, and with reference to FIG. 2, areservoir fluidly coupled to channel 202 may be provided with an aqueoussuspension of cells 214, while a reservoir coupled to channel 204 may beprovided with an aqueous suspension of beads 216 carrying theoligonucleotides. Channel segments 206 and 208 may be provided with anon-aqueous solution, e.g., an oil, into which the aqueous fluids arepartitioned as droplets at the channel junction 212. Finally, an outletreservoir may be fluidly coupled to channel 210 into which thepartitioned cells and beads can be delivered and from which they may beharvested. As will be appreciated, while described as reservoirs, itwill be appreciated that the channel segments may be coupled to any of avariety of different fluid sources or receiving components, includingtubing, manifolds, or fluidic components of other systems.

Also provided are systems that control flow of these fluids through thechannel networks e.g., through applied pressure differentials,centrifugal force, electrokinetic pumping, capillary or gravity flow, orthe like.

Also provided herein are kits for analyzing individual cells or smallpopulations of cells. The kits may include one, two, three, four, fiveor more, up to all of partitioning fluids, including both aqueousbuffers and non-aqueous partitioning fluids or oils, nucleic acidbarcode libraries that are releasably associated with beads, asdescribed herein, microfluidic devices, reagents for disrupting cellsamplifying nucleic acids, and providing additional functional sequenceson fragments of cellular nucleic acids or replicates thereof, as well asinstructions for using any of the foregoing in the methods describedherein.

The present disclosure provides computer control systems that areprogrammed to implement methods of the disclosure. FIG. 17 shows acomputer system 1701 that is programmed or otherwise configured toimplement methods of the disclosure including nucleic acid sequencingmethods, interpretation of nucleic acid sequencing data and analysis ofcellular nucleic acids, such as RNA (e.g., mRNA), and characterizationof cells from sequencing data. The computer system 1701 can be anelectronic device of a user or a computer system that is remotelylocated with respect to the electronic device. The electronic device canbe a mobile electronic device.

The computer system 1701 includes a central processing unit (CPU, also“processor” and “computer processor” herein) 1705, which can be a singlecore or multi core processor, or a plurality of processors for parallelprocessing. The computer system 1701 also includes memory or memorylocation 1710 (e.g., random-access memory, read-only memory, flashmemory), electronic storage unit 1715 (e.g., hard disk), communicationinterface 1720 (e.g., network adapter) for communicating with one ormore other systems, and peripheral devices 1725, such as cache, othermemory, data storage and/or electronic display adapters. The memory1710, storage unit 1715, interface 1720 and peripheral devices 1725 arein communication with the CPU 1705 through a communication bus (solidlines), such as a motherboard. The storage unit 1715 can be a datastorage unit (or data repository) for storing data. The computer system1701 can be operatively coupled to a computer network (“network”) 1730with the aid of the communication interface 1720. The network 1730 canbe the Internet, an internet and/or extranet, or an intranet and/orextranet that is in communication with the Internet. The network 1730 insome cases is a telecommunication and/or data network. The network 1730can include one or more computer servers, which can enable distributedcomputing, such as cloud computing. The network 1730, in some cases withthe aid of the computer system 1701, can implement a peer-to-peernetwork, which may enable devices coupled to the computer system 1701 tobehave as a client or a server.

The CPU 1705 can execute a sequence of machine-readable instructions,which can be embodied in a program or software. The instructions may bestored in a memory location, such as the memory 1710. The instructionscan be directed to the CPU 1705, which can subsequently program orotherwise configure the CPU 1705 to implement methods of the presentdisclosure. Examples of operations performed by the CPU 1705 can includefetch, decode, execute, and writeback.

The CPU 1705 can be part of a circuit, such as an integrated circuit.One or more other components of the system 1701 can be included in thecircuit. In some cases, the circuit is an application specificintegrated circuit (ASIC).

The storage unit 1715 can store files, such as drivers, libraries andsaved programs. The storage unit 1715 can store user data, e.g., userpreferences and user programs. The computer system 1701 in some casescan include one or more additional data storage units that are externalto the computer system 1701, such as located on a remote server that isin communication with the computer system 1701 through an intranet orthe Internet.

The computer system 1701 can communicate with one or more remotecomputer systems through the network 1730. For instance, the computersystem 1701 can communicate with a remote computer system of a user.Examples of remote computer systems include personal computers (e.g.,portable PC), slate or tablet PC's (e.g., Apple® iPad, Samsung® GalaxyTab), telephones, Smart phones (e.g., Apple® iPhone, Android-enableddevice, Blackberry®), or personal digital assistants. The user canaccess the computer system 1701 via the network 1730.

Methods as described herein can be implemented by way of machine (e.g.,computer processor) executable code stored on an electronic storagelocation of the computer system 1701, such as, for example, on thememory 1710 or electronic storage unit 1715. The machine executable ormachine readable code can be provided in the form of software. Duringuse, the code can be executed by the processor 1705. In some cases, thecode can be retrieved from the storage unit 1715 and stored on thememory 1710 for ready access by the processor 1705. In some situations,the electronic storage unit 1715 can be precluded, andmachine-executable instructions are stored on memory 1710.

The code can be pre-compiled and configured for use with a machinehaving a processor adapted to execute the code, or can be compiledduring runtime. The code can be supplied in a programming language thatcan be selected to enable the code to execute in a pre-compiled oras-compiled fashion.

Aspects of the systems and methods provided herein, such as the computersystem 1701, can be embodied in programming. Various aspects of thetechnology may be thought of as “products” or “articles of manufacture”typically in the form of machine (or processor) executable code and/orassociated data that is carried on or embodied in a type of machinereadable medium. Machine-executable code can be stored on an electronicstorage unit, such as memory (e.g., read-only memory, random-accessmemory, flash memory) or a hard disk. “Storage” type media can includeany or all of the tangible memory of the computers, processors or thelike, or associated modules thereof, such as various semiconductormemories, tape drives, disk drives and the like, which may providenon-transitory storage at any time for the software programming. All orportions of the software may at times be communicated through theInternet or various other telecommunication networks. Suchcommunications, for example, may enable loading of the software from onecomputer or processor into another, for example, from a managementserver or host computer into the computer platform of an applicationserver. Thus, another type of media that may bear the software elementsincludes optical, electrical and electromagnetic waves, such as usedacross physical interfaces between local devices, through wired andoptical landline networks and over various air-links. The physicalelements that carry such waves, such as wired or wireless links, opticallinks or the like, also may be considered as media bearing the software.As used herein, unless restricted to non-transitory, tangible “storage”media, terms such as computer or machine “readable medium” refer to anymedium that participates in providing instructions to a processor forexecution.

Hence, a machine readable medium, such as computer-executable code, maytake many forms, including but not limited to, a tangible storagemedium, a carrier wave medium or physical transmission medium.Non-volatile storage media include, for example, optical or magneticdisks, such as any of the storage devices in any computer(s) or thelike, such as may be used to implement the databases, etc. shown in thedrawings. Volatile storage media include dynamic memory, such as mainmemory of such a computer platform. Tangible transmission media includecoaxial cables; copper wire and fiber optics, including the wires thatcomprise a bus within a computer system. Carrier-wave transmission mediamay take the form of electric or electromagnetic signals, or acoustic orlight waves such as those generated during radio frequency (RF) andinfrared (IR) data communications. Common forms of computer-readablemedia therefore include for example: a floppy disk, a flexible disk,hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD orDVD-ROM, any other optical medium, punch cards paper tape, any otherphysical storage medium with patterns of holes, a RAM, a ROM, a PROM andEPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wavetransporting data or instructions, cables or links transporting such acarrier wave, or any other medium from which a computer may readprogramming code and/or data. Many of these forms of computer readablemedia may be involved in carrying one or more sequences of one or moreinstructions to a processor for execution.

The computer system 1701 can include or be in communication with anelectronic display 1735 that comprises a user interface (UI) 1740 forproviding, for example, results of nucleic acid sequencing, analysis ofnucleic acid sequencing data, characterization of nucleic acidsequencing samples, cell characterizations, etc. Examples of UI'sinclude, without limitation, a graphical user interface (GUI) andweb-based user interface.

Methods and systems of the present disclosure can be implemented by wayof one or more algorithms. An algorithm can be implemented by way ofsoftware upon execution by the central processing unit 1705. Thealgorithm can, for example, initiate nucleic acid sequencing, processnucleic acid sequencing data, interpret nucleic acid sequencing results,characterize nucleic acid samples, characterize cells, etc.

EXAMPLES

The following non-limiting examples are given for the purpose ofillustrating various embodiments of present disclosure.

Example I: Cellular RNA Analysis Using Emulsions

In an example, reverse transcription with template switching and cDNAamplification (via PCR) is performed in emulsion droplets withoperations as shown in FIG. 9A. The reaction mixture that is partitionedfor reverse transcription and cDNA amplification (via PCR) includes1,000 cells or 10,000 cells or 10 ng of RNA, beads bearing barcodedoligonucleotides/0.2% Tx-100/5× Kapa buffer, 2× Kapa HS HiFi Ready Mix,4 μM switch oligo, and Smartscribe. Where cells are present, the mixtureis partitioned such that a majority or all of the droplets comprise asingle cell and single bead. The cells are lysed while the barcodedoligonucleotides are released from the bead, and the poly-T segment ofthe barcoded oligonucleotide hybridizes to the poly-A tail of mRNA thatis released from the cell as in operation 950. The poly-T segment isextended in a reverse transcription reaction as in operation 952 and thecDNA transcript is amplified as in operation 954. The thermal cyclingconditions are 42° C. for 130 minutes; 98° C. for 2 min; and 35 cyclesof the following 98° C. for 15 sec, 60° C. for 20 sec, and 72° C. for 6min. Following thermal cycling, the emulsion is broken and thetranscripts are purified with Dynabeads and 0.6×SPRI as in operation956.

The yield from template switch reverse transcription and PCR inemulsions is shown for 1,000 cells in FIG. 13A and 10,000 cells in FIG.13C and 10 ng of RNA in FIG. 13B (Smartscribe line). The cDNAtranscripts from RT and PCR performed in emulsions for 10 ng RNA issheared and ligated to functional sequences, cleaned up with 0.8×SPRI,and is further amplified by PCR as in operation 958. The amplificationproduct is cleaned up with 0.8×SPRI. The yield from this processing isshown in FIG. 13B (SSII line).

Example II: Cellular RNA Analysis Using Emulsions

In another example, reverse transcription with template switching andcDNA amplification (via PCR) is performed in emulsion droplets withoperations as shown in FIG. 9A. The reaction mixture that is partitionedfor reverse transcription and cDNA amplification (via PCR) includesJurkat cells, beads bearing barcoded oligonucleotides/0.2%TritonX-100/5× Kapa buffer, 2× Kapa HS HiFi Ready Mix, 4 μM switcholigo, and Smartscribe. The mixture is partitioned such that a majorityor all of the droplets comprise a single cell and single bead. The cellsare lysed while the barcoded oligonucleotides are released from thebead, and the poly-T segment of the barcoded oligonucleotide hybridizesto the poly-A tail of mRNA that is released from the cell as inoperation 950. The poly-T segment is extended in a reverse transcriptionreaction as in operation 952 and the cDNA transcript is amplified as inoperation 954. The thermal cycling conditions are 42° C. for 130minutes; 98° C. for 2 min; and 35 cycles of the following 98° C. for 15sec, 60° C. for 20 sec, and 72° C. for 6 min. Following thermal cycling,the emulsion is broken and the transcripts are cleaned-up with Dynabeadsand 0.6×SPRI as in operation 956. The yield from reactions with variouscell numbers (625 cells, 1,250 cells, 2,500 cells, 5,000 cells, and10,000 cells) is shown in FIG. 14A. These yields are confirmed withGADPH qPCR assay results shown in FIG. 14B.

Example III: RNA Analysis Using Emulsions

In another example, reverse transcription is performed in emulsiondroplets and cDNA amplification is performed in bulk in a manner similarto that as shown in FIG. 9C. The reaction mixture that is partitionedfor reverse transcription includes beads bearing barcodedoligonucleotides, 10 ng Jurkat RNA (e.g., Jurkat mRNA), 5× First-Strandbuffer, and Smartscribe. The barcoded oligonucleotides are released fromthe bead, and the poly-T segment of the barcoded oligonucleotidehybridizes to the poly-A tail of the RNA as in operation 961. The poly-Tsegment is extended in a reverse transcription reaction as in operation963. The thermal cycling conditions for reverse transcription are onecycle at 42° C. for 2 hours and one cycle at 70° C. for 10 min.Following thermal cycling, the emulsion is broken and RNA and cDNAtranscripts are denatured as in operation 962. A second strand is thensynthesized by primer extension with a primer having a biotin tag as inoperation 964. The reaction conditions for this primer extension includecDNA transcript as the first strand and biotinylated extension primerranging in concentration from 0.5-3.0 μM. The thermal cycling conditionsare one cycle at 98° C. for 3 min and one cycle of 98° C. for 15 sec,60° C. for 20 sec, and 72° C. for 30 min. Following primer extension,the second strand is pulled down with Dynabeads MyOne Streptavidin C1and T1, and cleaned-up with Agilent SureSelect XT buffers. The secondstrand is pre-amplified via PCR as in operation 965 with the followingcycling conditions—one cycle at 98° C. for 3 min and one cycle of 98° C.for 15 sec, 60° C. for 20 sec, and 72° C. for 30 min. The yield forvarious concentrations of biotinylated primer (0.5 μM, 1.0 μM, 2.0 μM,and 3.0 μM) is shown in FIG. 15.

Example IV: RNA Analysis Using Emulsions

In another example, in vitro transcription by T7 polymerase is used toproduce RNA transcripts as shown in FIG. 10. The mixture that ispartitioned for reverse transcription includes beads bearing barcodedoligonucleotides which also include a T7 RNA polymerase promotersequence, 10 ng human RNA (e.g., human mRNA), 5× First-Strand buffer,and Smartscribe. The mixture is partitioned such that a majority or allof the droplets comprise a single bead. The barcoded oligonucleotidesare released from the bead, and the poly-T segment of the barcodedoligonucleotide hybridizes to the poly-A tail of the RNA as in operation1050. The poly-T segment is extended in a reverse transcription reactionas in operation 1052. The thermal cycling conditions are one cycle at42° C. for 2 hours and one cycle at 70° C. for 10 min. Following thermalcycling, the emulsion is broken and the remaining operations areperformed in bulk. A second strand is then synthesized by primerextension as in operation 1054. The reaction conditions for this primerextension include cDNA transcript as template and extension primer. Thethermal cycling conditions are one cycle at 98° C. for 3 min and onecycle of 98° C. for 15 sec, 60° C. for 20 sec, and 72° C. for 30 min.Following this primer extension, the second strand is purified with0.6×SPRI. As in operation 1056, in vitro transcription is then performedto produce RNA transcripts. In vitro transcription is performedovernight, and the transcripts are purified with 0.6×SPRI. The RNAyields from in vitro transcription are shown in FIG. 16.

Example V: Analysis of T-Cell Receptors (TCRs)

In this example, methods disclosed herein are used to assay T-cellreceptors. To generate labeled polynucleotides comprising T-cellreceptor gene sequences, T-cells are co-partitioned with gel beadscomprising barcoded template switching oligonucleotides. Prior topartitioning, T-cells are optionally enriched from a cell sample, forexample by fluorescence activated cell sorting (FACS) or other sortingtechnique. Additional reagents for generating labeled polynucleotidesincluding, but not limited to, reverse transcriptase enzyme, poly(dT)primer, and dNTPs, are delivered to partitions as part of a master mix.Within partitions, cells are lysed, thereby yielding templatepolynucleotides comprising nucleic acids from T-cells. As illustratedschematically in FIG. 19, a T-cell derived template polynucleotidecomprising mRNA (e.g., 1920), poly(dT) primer (e.g., 1924), and atemplate switching oligonucleotide (e.g., 1926) are subjected to anamplification reaction to yield a first amplification product. Thepoly(dT) primer hybridizes to the polyA tail of the mRNA templatepolynucleotide and acts as a primer for reverse transcription by thereverse transcriptase enzyme that is co-partitioned with the T-cell(e.g., 1950). The reverse transcriptase enzyme has terminal transferaseactivity and adds additional nucleotides, e.g., polyC, to the cDNAtranscript in a template independent manner. The template switchingoligonucleotide (e.g., 1926) hybridizes to the cDNA transcript andfacilitates template switching in the first amplification reaction(e.g., 1952).

Using methods disclosed herein, reverse transcription performed withinpartitions generates unbiased cDNA comprising a sequencing adapter, acell barcode and a unique molecular identifier (UMI) on the 5′ end ofthe transcript. To enrich for transcripts comprising TCR gene sequences,the first amplification reaction product or cDNA transcript is subjectedto a second amplification reaction to generate a second amplificationproduct. Polymerase chain reaction (PCR) is performed with one primerfor the 5′ end of the transcript and one or more primers for the desiredTCR/Ig constant region(s) (e.g., primers targeting TCR alpha (α) and/orbeta (β) chain, and in some cases gamma and/or delta (γ/δ) chains). Thecontents of multiple partitions can be combined such that the secondamplification reaction is performed in bulk.

Next, amplification products are subjected to enzymatic fragmentationand further processed to attach sequencing adaptors to generate asequencing library. Additional sequences include functional sequencessuch as flow cell attachment sequences and sequencing primer bindingsequences. The labeled polynucleotides are sequenced to yield sequencingreads, and sequencing reads are used to assemble full or partial TCRreceptor gene sequences. Additional analysis includes transcriptcounting for which an analysis pipeline may include, for example, (i)barcode processing, (ii) read filtering, (iii) cell-by-cell consensusassembly, (iv) V(D)J annotation, and (v) clonotype inference andclustering.

Other receptors (e.g., B-cell receptors (BCRs) and Ig receptors) can besimilarly analyzed using the methods described herein by partitioningthe appropriate immune cell type for generating labeled polynucleotidesand using receptor specific primers to generate amplification products.

Example VI: Enrichment of T-Cell Receptor (TCR) Transcripts

In this example, cellular suspensions of 3,000; 6,000; or 12,000 primaryhuman T cells were loaded on a GemCode Single Cell Instrument (10×Genomics, Pleasanton, Calif.) to generate single cell-gel bead emulsions(SC-GEMs). The gel beads were modified to carry a template switchingoligonucleotide (TSO) as shown in FIG. 18 or FIG. 20 at 8 μM, yielding afinal concentration of 0.32 μM in GEM. After creation of SC-GEMs,reverse transcription was performed on the cells in emulsion using apoly-T primer and reverse transcriptase for 5 minutes at 55° C.,followed by 1 hour and 55 minutes at 52° C. After RT, GEMs were brokenand the single-stranded cDNA was cleaned up with DynaBeads® MyOne™Silane Beads and SPRIselect Reagent Kit (0.6×SPRI). cDNA was amplifiedfor 15 cycles with 1 minute extensions and amplified cDNA product wascleaned up with the SPRIselect Reagent Kit (0.6×SPRI). FIG. 22 shows thecDNA yields from all three cellular suspensions. cDNA yields from 12,000cells was greater than either 6,000 or 3,000 cells, which yieldedsimilar amounts.

Indexed sequencing libraries were constructed using the GemCode SingleCell 3′ Library Kit, following these steps: 1) end repair and A-tailing;2) adapter ligation; 3) post-ligation cleanup with SPRIselect; 4) sampleindex PCR and cleanup. These sequencing libraries were sequenced usingan Illumina MiSeq sequencer. Sequencing performance of poly-T primedlibraries was compared to libraries constructed from enriched cDNAlibraries created by using an enriched priming method which substitutedthe poly-T primer with primers that bound the constant region of TCRα,TCRβ, or both. FIG. 23 shows that enrichment led to reduced mapping ofsequencing reads to the transcriptome (8.9% unmappable reads with poly-Tpriming versus 49.3% for TCRα priming, 45.7% for TCRβ priming, or 39%for both). However, more reads mapped to the VDJ regions of TCRtranscripts, indicating that enrichment is important for targeted VDJsequencing (0.3% VDJ mappable fragments with poly T priming versus 15.5%for TCRα priming, 19.7% for TCRβ priming, or 29.50% for both). See FIG.23, Fraction fragments mapped to VDJ column.

In order to increase cDNA yields prior to sequencing librarypreparation, differing concentrations of TSO were tested. TSO weretested at concentrations of 32, 16, 8, 4, 2, 1, and 0.5 μM (which maycorrespond to 800, 400, 200, 100, 50, 25, and 12.5 μM immobilized to gelbeads). Jurkat T cells were used for this experiment, and results areshown in FIG. 24 with cDNA yields directly correlating with TSOconcentration and plateauing at about a concentration of 16 μM. Theseexperiments were repeated with TSO immobilized to gel beads (GB-TSTO)using either 6,000 primary T cells as shown in FIG. 25A, or 2,200 Jurkatcells as shown in FIG. 25B. GB-TSO concentrations of 8, 20, 100, and 200μM were tested and concentrations of 100 and 200 μM showed a significantincrease over the lower concentrations of 8 and 20 μM.

Differing enrichment schemes were tested to determine optimal enrichmentmethods. Using a non-GEM protocol, cDNA from 3,000 primary T cells wasprepared using poly-T priming followed by enrichment using primers thatanneal to TCR constant regions, yielding 38.5% VDJ mappable reads,quantitation of this enrichment is shown in FIG. 26A. Using a gel-beadsin emulsion-reverse transcription reaction (GEM-RT) protocol, cDNA from6,000 primary T cells was prepared using Poly-T priming and gel beadswith a TSO at 8, 100 or 200 μM concentration followed by enrichmentusing a two stage nested approach. This nested enrichment comprised PCRusing outer primers annealing to the TCR alpha and beta paired with a P7primer for 10 cycles using 60° C. extensions, followed by PCR usinginner primers annealing to the TCR alpha and beta paired with a P7primer for 10 cycles using 60° C. extensions. Results of this are shownin FIG. 26B with the largest amount of enrichment exhibited using alower concentration of gel beads (8 μM).

To further optimize enrichment, a comparison was conducted between usingP7 primers and variable region specific primers in combination with theconstant region primers for cDNA amplification. Primer sequences usedare shown in Table 4.

TABLE 4 Enrichment primer sequences and predicted products PrimerSequence V region TRA-V1 ACTTGTCCAGCCTAACCTGC primers (SEQ ID NO: 160)TRA-V2 TTACCCTGGGAGGAACCAGA (SEQ ID NO: 161) TRB-Vl TTTCAGGCCACAACTCCCTT(SEQ ID NO: 162) TRB-V2 CAGACAGACCATGATGCGGG (SEQ ID NO: 163) TRB-V3GCCACAACTCCCTTTTCTGG (SEQ ID NO: 164) Constant TRAC-innerAGTCTCTCAGCTGGTACACG region (SEQ ID NO: 165) primers TRBC-innerTCTGATGGCTCAAACACAGC (SEQ ID NO: 166) Predicted TRA-V1/TRAC-inner 554amplicon TRA-V2/TRAC-inner 413 size TRB-V1/TRBC-inner 324TRB-V2/TRBC-inner 296 TRB-V3/TRBC-inner 318

GEM-RT was carried out using poly-T priming followed by a templateswitch using 8 μM TSO-GB, followed by clean up, 15 cycles of cDNAamplification and enrichment for 20 cycles. Results shown in FIGS.27A-27C show that using V region primers in conjunction with constantregion primers specifically enrich TCR alpha (FIG. 27B) and TCR betasequences (FIG. 27C) compared to P7 primers paired with constant regionprimers. FIG. 28 further shows that by using specific enrichment (28Cand D; V region+C region primers) compared to general enrichment (28Aand B; P7 primer+C region primer) yields of specifically enrichedproduct were increased by increasing the amount of TSO-GB (from 8 μM to200 μm). This is in contrast to what was observed using P7-constantregion primer enrichment which required using less TSO-GB (8 μM) in theGEM-RT reaction to produce more enriched product. Overall using theP7-constant region enrichment allows preservation of barcode informationin the subsequent sequencing reaction. This configuration yields atleast 30% reads mappable to VDJ genes.

Example VII: Generating Labeled Polynucleotides

In this example, and with reference to FIGS. 29A and 29B, individualcells are lysed in partitions comprising gel bead emulsions (GEMs).GEMs, for example, can be aqueous droplets comprising gel beads. WithinGEMs, a template polynucleotide comprising an mRNA molecule can bereverse transcribed by a reverse transcriptase and a primer comprising apoly(dT) region. A template switching oligo (TSO) present in the GEM,for example a TSO delivered by the gel bead, can facilitate templateswitching so that a resulting polynucleotide product or cDNA transcriptfrom reverse transcription comprises the primer sequence, a reversecomplement of the mRNA molecule sequence, and a sequence complementaryto the template switching oligo. The template switching oligo cancomprise additional sequence elements, such as a unique molecularidentifier (UMI), a barcode sequence (BC), and a Read1 sequence. SeeFIG. 29A. In some cases, a plurality of mRNA molecules from the cell isreverse transcribed within the GEM, yielding a plurality ofpolynucleotide products having various nucleic acid sequences. Followingreverse transcription, the polynucleotide product can be subjected totarget enrichment in bulk. Prior to target enrichment, thepolynucleotide product can be optionally subjected to additionalreaction(s) to yield double-stranded polynucleotides. The target maycomprise VDJ sequences of a T cell and/or B cell receptor gene sequence.As shown at the top of the right panel of FIG. 29A, the polynucleotideproduct (shown as a double-stranded molecule, but can optionally be asingle-stranded transcript) can be subjected to a first targetenrichment polymerase chain reaction (PCR) using a primer thathybridizes to the Read 1 region and a second primer that hybridizes to afirst region of the constant region (C) of the receptor sequence (e.g.,TCR or BCR). The product of the first target enrichment PCR can besubjected to a second, optional target enrichment PCR. In the secondtarget enrichment PCR, a second primer that hybridizes to a secondregion of the constant region (C) of the receptor can be used. Thissecond primer can, in some cases, hybridize to a region of the constantregion that is closer to the VDJ region that the primer used in thefirst target enrichment PCR. Following the first and second (optional)target enrichment PCR, the resulting polynucleotide product can befurther processed to add additional sequences useful for downstreamanalysis, for example sequencing. The polynucleotide products can besubjected to fragmentation, end repair, A-tailing, adapter ligation, andone or more clean-up/purification operations.

In some cases, a first subset of the polynucleotide products from cDNAamplification can be subjected to target enrichment (FIG. 29B, rightpanel) and a second subset of the polynucleotide products from cDNAamplification is not subjected to target enrichment (FIG. 29B, bottomleft panel). The second subset can be subjected to further processingwithout enrichment to yield an unenriched, sequencing ready populationof polynucleotides. For example, the second subset can be subjected tofragmentation, end repair, A-tailing, adapter ligation, and one or moreclean-up/purification operations.

The labeled polynucleotides can then be subjected to sequencinganalysis. Sequencing reads of the enriched polynucleotides can yieldsequence information about a particular population of the mRNA moleculesin the cell whereas the enriched polynucleotides can yield sequenceinformation about various mRNA molecules in the cell.

Example VIII: Base Error Calculations

All calculations of this example are for a single base. The termstranscript and UMI will be used inter-changeably, assuming that there isa 1-1 relationship between transcripts and UMIs. Let D be all observeddata (reads, qualities, UMIs) at a given base and D_(u), u=1, . . . , mbe the data from UMI u. Let R be the real template base at the givenposition and T_(u) be the (unobserved) base at the given position ontranscipt/UMI u. Let _(Rui) and _(Rui) be the real (pre-sequencingerrors) and observed (post-sequencing errors) bases on the i^(th) readof UMI u and _(Qui) be the corresponding base quality. Let _(prt) beprobability of an RT error, and p_(pcr) be the probability of a PCRerror. Let also p_(s)(Q)=10−Q/10 be the probability of a sequencingerror given a base quality of Q. Finally, let L={A, C, G, T}.Transcripts are conditionally independent given the real template baseand reads from a transcript are conditionally independent given the baseof the transcript (i.e. errors occur completely independently of oneanother). Below, Equation I can be derived by summing over theunobserved value c of transcript u at the given position and the (alsounobserved) real value d of each read at this position.

$\begin{matrix}{{P( D \middle| R )} = {{\prod\limits_{u}{P( D_{u} \middle| R )}} = {{\sum\limits_{u}{\sum\limits_{c \in L}{{P( D_{u} \middle| {T_{u} + e} )}{P( {T_{u} =  c \middle| R } )}}}} = {{\prod\limits_{u}{\sum\limits_{c \in L}{\prod\limits_{i}{{P( { R_{u_{i}} \middle| T_{u}  = c} )}{P( {T_{u} =  c \middle| R } )}}}}} = {{\prod\limits_{u}{\sum\limits_{c \in L}{\prod\limits_{i}{\sum\limits_{d \in L}{{P( { R_{u_{i}} \middle| R_{u_{i}}  = d} )}{P( {R_{u_{i}} = { d \middle| T_{u}  = c}} )}{P( {T_{u} =  c \middle| R } )}}}}}} = {\prod\limits_{u}{\sum\limits_{c \in L}\lbrack {\prod\limits_{i}{\sum\limits_{d \in L}{\lbrack {( \frac{p_{s}( Q_{u_{i}} )}{3} )^{R_{u_{i}} \neq d}( {1 - {p_{s}( Q_{u_{i}} )}} )^{R_{u_{i}} = d}( \frac{P_{pcr}}{3} )^{d \neq c}( {1 - p_{pcr}} )^{d = c}} \rbrack ( \frac{p_{rt}}{3} )^{R \neq c}( {1 - p_{rt}} )^{R = c}}}} \rbrack}}}}}}} & {{Equation}\mspace{14mu} I}\end{matrix}$

If it is assumed that p_(pcr) is negligible (compared to sequencing andRT errors), that is the sequenced base R_(ui) always equals thetranscript base T_(u), the simplified form of Equation II can be derivedbelow.

$\begin{matrix}{\prod\limits_{u}{\sum\limits_{c \in L}\lbrack {( \frac{p_{ri}}{3} )^{R \neq c}( {1 - p_{rt}} )^{R = c}{\prod\limits_{i}{( \frac{p_{s}( Q_{u_{i}} )}{3} )^{R_{u_{i}} \neq c}( {1 - {p_{s}( Q_{u_{i}} )}} )^{R_{u_{i}} = c}}}} \rbrack}} & {{Equation}\mspace{14mu} {II}}\end{matrix}$

Let X be the called base at that position (i.e. the base in theassembled sequence). The probability of an error is:

${P( {R \neq X} \middle| D )} = {{1 - {P( {R =  X \middle| D } )}} = {{1 - \frac{{P( { D \middle| R  = X} )}{P( {R = X} )}}{P(D)}} = {1 - \frac{0.25*{P( D \middle| c )}}{\sum_{c \in L}{0.25*{P( D \middle| c )}}}}}}$

Devices, systems, compositions and methods of the present disclosure maybe used for various applications, such as, for example, processing asingle analyte (e.g., RNA, DNA, or protein) or multiple analytes (e.g.,DNA and RNA, DNA and protein, RNA and protein, or RNA, DNA and protein)form a single cell. For example, a biological particle (e.g., a cell orcell bead) is partitioned in a partition (e.g., droplet), and multipleanalytes from the biological particle are processed for subsequentprocessing. The multiple analytes may be from the single cell. This mayenable, for example, simultaneous proteomic, transcriptomic and genomicanalysis of the cell.

While some embodiments of the present invention have been shown anddescribed herein, it will be obvious to those skilled in the art thatsuch embodiments are provided by way of example only. It is not intendedthat the invention be limited by the specific examples provided withinthe specification. While the invention has been described with referenceto the aforementioned specification, the descriptions and illustrationsof the embodiments herein are not meant to be construed in a limitingsense. Numerous variations, changes, and substitutions will now occur tothose skilled in the art without departing from the invention.Furthermore, it shall be understood that all aspects of the inventionare not limited to the specific depictions, configurations or relativeproportions set forth herein which depend upon a variety of conditionsand variables. It should be understood that various alternatives to theembodiments of the invention described herein may be employed inpracticing the invention. It is therefore contemplated that theinvention shall also cover any such alternatives, modifications,variations or equivalents. It is intended that the following claimsdefine the scope of the invention and that methods and structures withinthe scope of these claims and their equivalents be covered thereby.

What is claimed:
 1. A method for nucleic acid sequencing, comprising:(a) providing a plurality of droplets, wherein a droplet of saidplurality of droplets comprises (i) a ribonucleic acid (RNA) moleculecomprising a nucleic acid sequence, and (ii) a bead comprising a nucleicacid barcode molecule coupled thereto, wherein said nucleic acid barcodemolecule comprises a barcode sequence; (b) using said RNA molecule andsaid nucleic acid barcode molecule to generate a barcoded nucleic acidmolecule comprising, from a 5′ end to a 3′ end, a sequence correspondingto said nucleic acid sequence of said RNA molecule and a complement ofsaid barcode sequence; and (c) sequencing said barcoded nucleic acidmolecule or a derivative thereof.
 2. The method of claim 1, wherein saidRNA molecule is from a cell.
 3. The method of claim 2, wherein saiddroplet comprises said cell.
 4. The method of claim 3, furthercomprising releasing said RNA molecule from said cell prior to (b). 5.The method of claim 1, wherein said bead comprises a plurality ofnucleic acid molecules coupled thereto, wherein said plurality ofnucleic acid molecules comprises said nucleic acid barcode molecule. 6.The method of claim 5, wherein each of said plurality of nucleic acidmolecules comprises said barcode sequence.
 7. The method of claim 6,wherein each of said plurality of nucleic acid molecules comprises anadditional barcode sequence that varies across said plurality of nucleicacid molecules.
 8. The method of claim 1, wherein said nucleic acidbarcode molecule comprises a template switching sequence.
 9. The methodof claim 1, further comprising, prior to (c), subjecting said barcodednucleic acid molecule or derivative thereof to nucleic acidamplification.
 10. The method of claim 9, wherein said nucleic acidamplification is performed subsequent to releasing said barcoded nucleicacid molecule or derivative thereof from said droplet.
 11. The method ofclaim 9, wherein said nucleic acid amplification is polymerase chainreaction.
 12. The method of claim 1, wherein said RNA molecule is amessenger ribonucleic acid (mRNA) molecule.
 13. The method of claim 1,wherein in (a) said droplet comprises (i) an additional nucleic acidmolecule comprising an additional nucleic acid sequence, and (ii) anadditional nucleic acid barcode molecule comprising an additionalbarcode sequence, and wherein in (b) said additional nucleic acidmolecule and said additional nucleic acid barcode molecule are used togenerate an additional barcoded nucleic acid molecule comprising, from a5′ end to a 3′ end, said additional barcode sequence and an additionalsequence corresponding to said additional nucleic acid sequence.
 14. Themethod of claim 13, wherein said additional nucleic acid barcodemolecule is coupled to said bead.
 15. The method of claim 1, wherein (b)comprises extending a primer hybridized to a region at a 3′ end of theRNA molecule in a primer extension reaction, said nucleic acid barcodemolecule acting as a template switching oligonucleotide in said primerextension reaction, thereby generating said barcoded nucleic acidmolecule comprising, from the 5′ end to the 3′ end, the sequencecorresponding to the nucleic acid sequence of the RNA molecule and thecomplement of the barcode sequence.
 16. The method of claim 1, wherein(b) is performed in said droplet.
 17. The method of claim 15, furthercomprising releasing said barcoded nucleic acid molecule or a derivativethereof from said droplet.
 18. The method of claim 1, wherein saidbarcoded nucleic acid molecule further comprises, towards a 5′ end, afunctional sequence for permitting said barcoded nucleic acid moleculeor a derivative thereof to couple to a flow cell of a sequencer.
 19. Themethod of claim 1, wherein said sequence is a reverse complement of saidnucleic acid sequence.
 20. The method of claim 1, further comprising,prior to (c), using said barcoded nucleic acid molecule or a derivativethereof and a pair of primers to generate nucleic acid molecules havinga target nucleic acid sequence.
 21. The method of claim 20, wherein saidtarget nucleic acid sequence comprises a T cell receptor variable regionsequence, a B cell receptor variable region sequence, or animmunoglobulin variable region sequence.
 22. The method of claim 20,wherein at least one of said pair of primers hybridizes to a constantregion of a T cell receptor nucleic acid sequence, a constant region ofa B cell receptor nucleic acid sequence, or a constant region of animmunoglobulin nucleic acid sequence.
 23. The method of claim 20,wherein said nucleic acid molecules having said target nucleic acidsequence or derivatives thereof are sequenced in (c).
 24. The method ofclaim 1, further comprising releasing said nucleic acid barcode moleculefrom said bead.
 25. The method of claim 24, wherein said nucleic acidbarcode molecule is released from said bead before said barcoded nucleicacid molecule is generated.
 26. The method of claim 24, wherein saidnucleic acid barcode molecule is released from said bead while saidbarcoded nucleic acid molecule is generated.
 27. The method of claim 24,wherein said nucleic acid barcode molecule is released from said beadafter said barcoded nucleic acid molecule is generated.
 28. The methodof claim 1, wherein said bead is a gel bead.
 29. The method of claim 1,wherein said barcode sequence is a combinatorial assembly of a pluralityof barcode segments.
 30. The method of claim 29, wherein plurality ofbarcode segments comprises at least three segments.